Whitelist cookies

Hi,

Request headers with many cookies, especially from google analytics or similar sources, could really disrupt CDN caching. I was searching on the internet for a solution to “whitelist cookies” in VCL. Allow a strictly limited set of cookies. All others will be removed.

Are there any problems with the code shown below? Or it looks ok. What do you think of this strategy?

# in vcl_recv

set req.http.Cookie_temp = "";

# example cookie 1
if (req.http.Cookie ~ "csrftoken=[^;]+") {
  set req.http.Cookie_temp = req.http.Cookie_temp + re.group.0 + "; " ;
}

# example cookie 2
if (req.http.Cookie ~ "config-sessionid=[^;]+") {
  set req.http.Cookie_temp = req.http.Cookie_temp + re.group.0 + "; " ;
}

# example cookie 3. And so on. Include more here.

# remove trailing semicolon
set req.http.Cookie_temp = regsuball(req.http.Cookie_temp, "(; )?$", "");

# For debugging, not required.
set req.http.Cookie_orig = req.http.Cookie;

set req.http.Cookie = req.http.Cookie_temp;

if (req.http.Cookie ~ "^\s*$") {
  unset req.http.Cookie;
}

Great question!

If the request almost always has the same set of cookies, you could simplify this to:

set req.http.cookie = "" +
  if (req.http.cookie:csrftoken, "csrftoken=" + req.http.cookie:csrftoken + "; ", "") +
  if (req.http.cookie:config-sessionid, "config-sessionid=" + req.http.cookie:config-sessionid, "")
;

Filtering cookies like this is a great idea. However, not sending cookies to origin at all is much better. Even filtered, the cookie header combines multiple unrelated pieces of high granularity data, and if you use that data on the origin, you’ll then have to include Vary: cookie in the HTTP response, forcing Fastly to store a separate variant of the object for each possible cookie value. And that will likely reduce cache hit ratio dramatically.

Obviously one solution is to remove any cookies that you don’t actually use, and you’re doing that. But what about cookies that are important?

Use the value at the edge

One solution is to consume the cookie value at the edge. One of your examples is a CSRF token, which is actually something you can often implement entirely edge-side. For example, use a stateless CSRF token format that includes a timestamp and hash of the URL, timestamp and client IP, and then re-run that hash at the edge to check that it’s valid.

Once you’ve consumed that cookie, you can then remove it from the request.

Decompose the cookie into separate headers

If you still have multiple cookies you need to pass to origin, consider spreading them over multiple custom headers:

set req.http.Session-ID = req.http.cookie:session-id;
set req.http.Edition = req.http.cookie:edition;
unset req.http.cookie;

Now, imagine session-id is a very granular value that is usually different for every user, while edition has only 2 possible values, “local” and “international”. If a request goes to your origin and it pays no attention to the session ID but uses the edition value, it can now return a response with a vary header like this:

Vary: Edition

Now Fastly only needs to store two variants of this response in order to satisfy all possible requests.

2 Likes

Thanks for the answer. Very interesting. That is really a compact solution to extract only certain cookies.

Yes! Actually on a majority of pages, which happen to be documentation, I’m removing the cookies completely. The (django) server continues to send Vary: cookie but since there is no cookie to vary on, it ought to be ok.