For those working in a Java JAX-RS environment and looking for an additional rate filter on the app server itself, here is a similar Redis+Lua rate limiter implemented as a Jersey/JAX-RS filter [1].<p>It supports multiple limits, for example max 100 requests/minute and 10000/day, etc. The lua magic is here [2].<p>[1] <a href="https://github.com/cobbzilla/cobbzilla-wizard/blob/master/wizard-server/src/main/java/org/cobbzilla/wizard/filters/RateLimitFilter.java" rel="nofollow">https://github.com/cobbzilla/cobbzilla-wizard/blob/master/wi...</a><p>[2] <a href="https://github.com/cobbzilla/cobbzilla-wizard/blob/master/wizard-server/src/main/resources/org/cobbzilla/wizard/filters/api_limiter_redis.lua" rel="nofollow">https://github.com/cobbzilla/cobbzilla-wizard/blob/master/wi...</a>
Something very similar can be achieved in HAProxy using a powerful feature called stick tables. [1] [2] [3]<p>[1] <a href="https://www.haproxy.com/blog/introduction-to-haproxy-stick-tables/" rel="nofollow">https://www.haproxy.com/blog/introduction-to-haproxy-stick-t...</a><p>[2] <a href="https://www.haproxy.com/blog/bot-protection-with-haproxy/" rel="nofollow">https://www.haproxy.com/blog/bot-protection-with-haproxy/</a><p>[3] <a href="https://www.haproxy.com/blog/using-haproxy-as-an-api-gateway-part-1/" rel="nofollow">https://www.haproxy.com/blog/using-haproxy-as-an-api-gateway...</a>
Awesome post, from a fellow Brazilian!<p>We did a very similar implementation (although not distributed) for a similar problem, using Redis and Laravel.<p>We had MANY people crawling our website, and we would prefer that they use our API for that. Using Redis we block IPs who accessed our website more than X times not logged-in (200 URLs right now).<p>We also had the requirement that all good bots(Bing, Baidu, Google) should pass-thru without blocks or any slowdown.
Another requirement, was that those good bots should be verified(Reverse & Forward DNS Lookup) before entering out good bot list.<p>It is working great for our high-traffic website ( 2 Mi hits/day). You can check our work here:
<a href="https://github.com/Potelo/laravel-block-bots" rel="nofollow">https://github.com/Potelo/laravel-block-bots</a>
Not to say they did anything wrong, great work! But if facing the same problem, but for inhouse solution I'd consider using auth_requrest in the first place.<p><a href="https://nginx.org/en/docs/http/ngx_http_auth_request_module.html" rel="nofollow">https://nginx.org/en/docs/http/ngx_http_auth_request_module....</a><p>To me, advantage is archtectural, that I would not have specify which parameters of request are considered or how are they processed. Disadvantage is semantic, returning 403 instead of 429. But original article states returning 403 anyway.<p>And also, regarding rate limiting by IP, I think it should be done for 10x-100x of single user limit, just as first line of defense. Also nginx rate limiting has notion of burst which helps filter out "smart" crawlers, which unlike users, send requests for hours.
A more efficient (but no histogram) way would be native redis module (rust) <a href="https://github.com/brandur/redis-cell" rel="nofollow">https://github.com/brandur/redis-cell</a>
Probably better to use a redis hash "map" instead of multiple keys. Redis will store these very efficiently too if you only have a few keys within it.