Serve stale content based on user agent (crawlers)


#1

Hi,

We have some aggressive crawlers, given there isn’t much of an out of the box rate limit solution… is it possible to craft the VCL to serve stale content?

We have our content stored for 1 day(often TTL of 2 hours~) as stale so it should be possible to serve all the crawlers the same stale cache rather than causing backend requests?

In our use case we not massively fussed if a web crawler gets an out of date page.


#2

I just did a rudimentary test and it looks like it’s possible. I created 2 VCL snippets: one for detecting the condition I wanted to serve stale on, and the other for actually serving stale. You’d have to check for req.http.user-agent instead.

In vcl_recv

if (req.http.foo == "bar"){
 error 901;
}

In vcl_error

  if (obj.status == 901) {
    if (stale.exists) {
      return(deliver_stale);
    }
  }

#3

Hmm, and what if the stale doesn’t exist? can I then do return (deliver); ? or would I have do somekind of restart?


#4

By default, it would return(deliver); after the check for stale, so no need to add that explicitly.

However, there is one thing I missed: when in vcl_error and the status is 901it needs to be changed back to 200 before you return (either from the stale or no-stale condition).


#5

Hi @gsdevme, I have been working up this use case as a example demo to show off a new tool we call Fastly Fiddle. My demo can be found here:

https://fiddle.fastlydemo.net/fiddle/c12d9b59

If you are still interested in implementing this, that might be useful to you.