Googlebot always seems to bypass the cache


Hi guys,

I’ve got caching working with something like

                // 'Cache-Control': 's-maxage=86400'
                'Cache-Control': 'no-cache, no-store, must-revalidate',
                'Surrogate-Control': 'max-age=86400',
                'Pragma': 'no-cache',
                'Expires': 0,

And it works, but when I fetch as Google in webmaster tools, the requests constantly hit my origin.

So I figured that it was hitting different POP locations each time, which it was, but even after enabling shielding it still gets through to my origin.

This is Googlebots response headers

HTTP/1.1 200 OK
Server: cloudflare-nginx
Date: Wed, 20 Apr 2016 13:17:17 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: __cfduid=d4277cb882bba4e2c64c73047786f4b4f1461158237; expires=Thu, 20-Apr-17 13:17:17 GMT; path=/;; HttpOnly
Status: 200 OK
Cache-Control: no-cache, no-store, must-revalidate
Access-Control-Allow-Origin: *
Pragma: no-cache
Access-Control-Allow-Headers: X-Requested-With
Expires: 0
Set-Cookie: ff=; Path=/
X-Powered-By: Phusion Passenger 5.0.25
Via: 1.1 vegur
Via: 1.1 varnish
Fastly-Debug-Digest: c839f034d8e0abbc67f39899758e6584ba34b305f3abd5ef06fcf4aab0a6dbb7
Via: 1.1 varnish
Age: 0
X-Served-By: cache-jfk1022-JFK, cache-ord1731-ORD
X-Cache-Hits: 0, 0
X-Timer: S1461158237.183803,VS0,VE134
CF-RAY: 2968e025e8822629-DFW


And here is a browser request that is working

access-control-allow-headers:X-Requested-With access-control-allow-origin:* age:1395 cache-control:no-cache, no-store, must-revalidate cf-ray:2968ed9eca981d68-MEL content-encoding:gzip content-type:text/html; charset=utf-8 date:Wed, 20 Apr 2016 13:26:28 GMT expires:0 fastly-debug-digest:a7b7ff5fbbfb6b4d6a4135e2b64da745fe75f1277a6a5d57bde4f418ce574c3b pragma:no-cache server:cloudflare-nginx status:200 status:200 OK via:1.1 varnish via:1.1 vegur via:1.1 varnish x-cache:MISS, HIT x-cache-hits:0, 2 x-powered-by:Phusion Passenger 5.0.25 x-served-by:cache-jfk1036-JFK, cache-mel6521-MEL x-timer:S1461158788.924244,VS0,VE0


Hi Thomas

The set-cookie header prevents us from caching the response. This is to prevent user-specific content being cached and served among different users.

It looks like it only gets set when the request didn’t come in with a corresponding cookie header (the Googlebot request), while the one from your browser does have one.

If you need to, you can remove the set-cookie header from the response: