I have a set of rules for load shedding that I urge you to consider. First, and foremost, whenever you read (or are about to say) "load shedding" just mentally substitute the correct terminology, which is "intentionally serving errors." This will put you in the right frame of mind to properly ponder the outcomes.<p>Secondly, the error path on your backend must be strictly cheaper than the success path, or the whole scheme doesn't work. Particularly bad actions on error are for example logging an error at such a high severity that the log files need to be flushed and synced, which is likely to be tremendously expensive. Another example is taking a mutex and incrementing some error counter that normally wouldn't be incremented on the serving path. If this tends to synchronize all your serving threads, your server will collapse.<p>Third, load shedding can only be implemented correctly if you control the client and the server, end-to-end. Perhaps you want to avoid hot spots by serving a soft error from an overloaded shard. If your client is guaranteed to try another shard (or just give up) this is a good approach. If the client might retry on the same shard, then it's not helpful. You just "shed load" in such a way that you had to serve the same request twice.