When deciding what mechanism to employ to load shed, you should keep in mind the layer at which you are load shedding. Modern distributed systems are comprised of many layers. You can do it at the load balancer, at the OS level, or in the application logic. This becomes a trade-off. As you get closer to the core application logic, the more information you will have to make a decision. On the other hand, as you get closer, the more work you have already performed and the more cost there is to throwing away the request.<p>You may employ techniques more complex than a simple bucketing mechanism, such as acutely observing the degree at which clients are exceeding their baseline. However, these techniques aren’t free. The cost of simply throwing away the request can overwhelm your server - and the more steps you add before the shedding part the lower the maximum throughput you can tolerate before going to 0 availability. It’s important to understand at what point this happens when designing a system that takes advantage of this technique.<p>For example, If you do it at the OS level, it is a lot cheaper than leaving it to the server process. If you choose to do it in your application logic, think carefully about how much work is done for the request before it gets thrown away. Are you validating a token before you are making your decision?