A few of extra considerations picked up over many years of hard lessons:<p>1. Rate limits don't really protect against backend capacity issues, especially if they are statically configured. Consider rate limits to be "policy" limits, meaning the policy of usage will be enforced, rather than protection against overuse of limited backend resources.<p>2. If the goal is to protect against bad traffic, consider additional steps besides simple rate limits. It may make sense to perform some sort of traffic prioritization based on authentication status, user/session priority, customer priority, etc. This comes in handy if you have a bad actor!<p>3. Be prepared for what to communicate or what action(s) to perform if and when the rate limits are hit, particularly from valuable customers or internal teams. Rate limits that will be lifted when someone complains might as well be advisory-only and not actually return a 429.<p>4. If you need to protect against concertina effects (all fixed windows, or many sliding windows expiring at the same time), add a deterministic offset to each user/session window so that no large group of rate limits can expire at the same time.<p>Hope that helps someone!
If your goal is to prevent DoS attempts from degrading the service of other tenants in a multitenant environment, fair queuing is the optimal approach. Give each client their own queue to which incoming traffic is enqueued, and have a background routine that repeatedly iterates over each queue, dequeuing a single request and servicing it. Any client that spams requests will only congest their own queue and not those of other clients.
I've implemented a lot of client handling code and always wondered what the optimal back-off strategy was when I hit a rate limit. It's interesting to read about the trade offs from the perspective of the service since that can inform how a client best reacts.
It is a shame GCRA is not more well known and used for rate limiting. It is, in my view, a better algorithm.<p><a href="https://medium.com/smarkets/implementing-gcra-in-python-5df1f11aaa96" rel="nofollow">https://medium.com/smarkets/implementing-gcra-in-python-5df1...</a>
<a href="https://en.m.wikipedia.org/wiki/Generic_cell_rate_algorithm" rel="nofollow">https://en.m.wikipedia.org/wiki/Generic_cell_rate_algorithm</a>
Interesting idea is rate limiting the client by requiring him to solve a puzzle for his request to be handled.<p>If his last request was recent make the puzzle harder. If last request was less recent make puzzle easier.<p>The puzzle might be like the one in bitcoin mining protocol. Guessing which bit string with specific amount of zeros at the end produces some random hash.
Rate limiters can also help scrape a site that doesn't like it, like Facebook. We use them to keep our scraping rate below the service's own rate limits.<p>But the simple algorithms are not at all covert. We assume companies can deploy rate limiting detectors. So we ask, are there rate limiting algorithms designed to evade detection?<p>In one solution, we train an HMM on the inter-request times of a human user. This gives us a much more covert rate limiter! But how does it perform against state of the art detectors?
Last year I tried very hard to get some rate-limiting in our lambda to work against an upstream target (so that our jobs don't trigger the rate limit of the upstream API). Sadly I could not find much literature on it specifically focusing on rate-limiting on NodeJS. No matter what I tried it would just not work on AWS Lambdas (would constantly overshoot the target, leading to the guess that something is wonky with timing), while passing the tests locally. I still don't know if it's because the timers on Lambda are behaving strangely (as token buckets need to be refilled) or if every rate limiting library out there for NodeJS is just broken. But also my own try wasn't any more reliable so... who knows.
What do you do when even your rate limiting layer gets fully saturated with requests? Does one have any options other than involving CF?<p>I thankfully never was in the postion to experience this but I always wondered how far let's say nftable rules go in thwarting a DoS attack against a conventional webapp on a tiny VPS.
I usually encounter rate limiting when trying to scrape some websites. I was even rate limited when manually browsing a website which considered I am a bot.