OK, there are a few things you ought to factor in to your thinking here:
- There’s no way to read the number of currently open connections to your origin in varnish, and it wouldn’t tell you much even if their were, because of connection reuse, or in the case of HTTP/2, because all requests are multiplexed on one connection.
- A single user will be making lots of requests, so you would need to make sure that if you served a user’s HTML request, you also serve all their subresource requests as well.
- If you provide a countdown, and when the countdown expires, your servers are even more busy, you may not be able to handle the request.
This is generally why it makes sense for your origin server to provide a signal to Fastly to let us know if it’s OK to send traffic to you, which we call a health check. If the health check were failing we could generate a synthetic response.
That said, if you want the edge solution, we can offer some options. These usually revolve around generating an object and hitting it on every request that you want to limit, then checking it’s hit count (
obj.hits) and allow the request to restart and go to origin if you have not reached some limit. Here is a demo of that.
Since that solution operates in buckets, the result might be quite spiky - you’d get all requests until you hit the limit, and then nothing until you tick into the next bucket. So in theory you could do this to figure out your request rate in a rudimentary way:
- In vcl_miss (and/or vcl_pass depending on what requests you want to rate-limit), if
restarts == 0, rewrite
req.url to a rate-limit URL such as
"/__rate-limit/" time.sub(now, 1s), and set the backend to the service itself, or another service that can generate a small static response. Save the original URL in a header like
- In deliver, if
restarts == 0 and req.url is a rate-limit URL, read
obj.hits, and stash it in a header, like
- Set req.url to
- In deliver, if
restarts == 1 and req.url is a rate-limit URL read
obj.hits. We’ll call this
- Work out what proportion of requests should go to origin. If you are willing to accept
max-rate reqs/second, then the calculation
( (prev-request-count - current-request-count) / max-rate ) would tell you what factor over your maximum rate you are (let’s call this result
load-factor. Eg if the
load-factor is 2, we are wanting to make twice as many requests to origin as you can actually take. Therefore in that case, only half the requests should actually make it to origin. So the probability that we should send this request to origin is
(1 / load-factor) (for an example load factor of 2, the probability is 0.5). We’ll call this
- Generate a random number between 0 and 1. If it’s lower than
fetch-probability, set req.url back to the original URL and restart. If it’s higher, generate a synthetic response containing your queue page (and if you want to include a countdown, maybe use the
load-factor to decide how long to wait)
This is tough to implement purely in VCL but I think probably possible. If you do decide to give it a go, let us know how you get on!