I’d like to display a holding page when concurrent users reaches a set threshold. I’ve look high and low and I cant find any documentation on how to get started with it. I looked in the VCL book too. Now I am new to VCL, so I may be searching for the wrong thing. Could anyone point me in the right direction?
Are you wanting to do this purely based on traffic reaching an absolute number of reqs/s, or based on signals from your origin servers?
Generally, customers use origin health signals as a means to do this, so you can set up a health check that will poll your origin for health, and stop serving traffic to it when it is “unhealthy”. You would need to implement an endpoint on your origin sever that returned a health signal using whatever logic you wanted.
It is possible to do something purely at the edge to rate limit the load on your origin, but not usually the best solution.
I’d actually want to try both, first would be reqs/s then moving to API calls from the origin hosts later, rather then response codes.
The idea would be to rate limit incoming connections, and give those that are waiting with a nice little count down that would let them in after a set number of minutes. In essence a virtual waiting room when things get busy at the back end. I can code, I just need to know what to code!
I’m guessing I need to count reqs/s or users session from varnish. “Y”. Once the count reaches X new sessions over Y get put into a holding queue, and redirected to a page with a count down, once the count down has expired they are allowed into the site. I’m thinking I could do the holding by setting a cookie and checking the expiry.
Not sure if this is the correct path to go down.
OK, there are a few things you ought to factor in to your thinking here:
- There’s no way to read the number of currently open connections to your origin in varnish, and it wouldn’t tell you much even if their were, because of connection reuse, or in the case of HTTP/2, because all requests are multiplexed on one connection.
- A single user will be making lots of requests, so you would need to make sure that if you served a user’s HTML request, you also serve all their subresource requests as well.
- If you provide a countdown, and when the countdown expires, your servers are even more busy, you may not be able to handle the request.
This is generally why it makes sense for your origin server to provide a signal to Fastly to let us know if it’s OK to send traffic to you, which we call a health check. If the health check were failing we could generate a synthetic response.
That said, if you want the edge solution, we can offer some options. These usually revolve around generating an object and hitting it on every request that you want to limit, then checking it’s hit count (
obj.hits) and allow the request to restart and go to origin if you have not reached some limit. Here is a demo of that.
Since that solution operates in buckets, the result might be quite spiky - you’d get all requests until you hit the limit, and then nothing until you tick into the next bucket. So in theory you could do this to figure out your request rate in a rudimentary way:
- In vcl_miss (and/or vcl_pass depending on what requests you want to rate-limit), if
restarts == 0, rewrite
req.urlto a rate-limit URL such as
"/__rate-limit/" time.sub(now, 1s), and set the backend to the service itself, or another service that can generate a small static response. Save the original URL in a header like
- In deliver, if
restarts == 0and req.url is a rate-limit URL, read
obj.hits, and stash it in a header, like
- Set req.url to
- In deliver, if
restarts == 1and req.url is a rate-limit URL read
obj.hits. We’ll call this
- Work out what proportion of requests should go to origin. If you are willing to accept
max-ratereqs/second, then the calculation
( (prev-request-count - current-request-count) / max-rate )would tell you what factor over your maximum rate you are (let’s call this result
load-factor. Eg if the
load-factoris 2, we are wanting to make twice as many requests to origin as you can actually take. Therefore in that case, only half the requests should actually make it to origin. So the probability that we should send this request to origin is
(1 / load-factor)(for an example load factor of 2, the probability is 0.5). We’ll call this
- Generate a random number between 0 and 1. If it’s lower than
fetch-probability, set req.url back to the original URL and restart. If it’s higher, generate a synthetic response containing your queue page (and if you want to include a countdown, maybe use the
load-factorto decide how long to wait)
This is tough to implement purely in VCL but I think probably possible. If you do decide to give it a go, let us know how you get on!
Wow thanks for the response. That’s put me on the right path. At the
minute I’m just setting up Dynamic Servers and getting my head round the
APIs, the holding page is my next job after that. I’ll keep you posted.
Thanks very much.
How does this relate to what’s on the Fastly pages here: https://www.fastly.com/blog/fastlys-edge-modules-that-will-power-your-ecommerce-site/
When an ecommerce site starts to get overloaded, you can use visitor prioritization logic to determine who is just browsing your site, and who is actively trying to buy something. With this logic, you can redirect casual shoppers to a waiting room, while active buyers can access the site freely and complete their transactions. Otherwise, during times of high traffic such as Black Friday and the holiday shopping season, you run the risk of both users getting a server too busy error. And with 79% percent of users choosing not to buy from ecommerce websites that perform poorly, that’s potentially a lot of money lost from otherwise paying customers.
I’ve been asking about this for quite a while - and ended up writing one custom. What’s the Fastly offering referred to on the blog?