Throttling web crawlers in vcl


#1

There are some aggressive web crawlers that we’d like to throttle on entry to the site. Are their any methods we can use to do this?


#2

Hi Jeff

Unfortunately we don’t have an out of the box solution that handles rate limiting of this nature but we can work around it through setting some conditions within the app.

If you are aware of which bots are crawling your site you can set up rate limiting by user-agent through the app. The VCL will allow 1 in 4 requests to hit and he other 3 will receive a synthetic responses. A random number generator will determine which of these requests will get through. You will need to follow these steps to implement this solution:

  1. Create a synthetic 503 Service Unavailable response for the bots that miss. See our doc page on how to create synthetic responses.

  2. After the response is created, set up a Request Condition thats satisfies the following:

if req.http.User-Agent ~ (Googlebot|Baiduspider) && !randombool(1, 4)

You should be able to extend the list of misbehaving bots as you see fit. This should do the trick!


#3

We also have a partnership with PerimeterX with regards to bot detection which would be worth checking out with regards to solving this problem.

Fastly Docs:
https://docs.fastly.com/guides/integrations/perimeterx-bot-defender

Direct Link to their product: https://www.perimeterx.com/products/bot-defender/?utm_source=partner&utm_medium=fastly&utm_campaign=botdefender