I think the key insight here is that just autoscaling your service to keep queue length short is not always gonna work, or not always going to be cost effective, and upstream sources should be prepared to reduce the amount of work they send (unless there is a business reason to do otherwise). Making the queue longer or just dropping data that's already accepted into the queue will sometimes break people's assumptions about how your system works. You can adjust queue size from time to time if you need it, but if your service sees increased use over time, the amount of variance in queue size can get a bit crazy.<p>I worked on a system a while back which processed petabytes of data... autoscaling was out of the question (we had a fixed amount of hardware to process the data, and the lead time for getting new hardware was not especially short), and buffering the data would require eye-watering amounts of disk space. We just buffered as much as we could and then returned a "try again later" error code. We made sure that data accepted by the system was processed within a short enough window. We made sure that the cost of submitting a work item was so small that you could really hammer the front-end as much as you wanted, and it would just say "try again later" or accept work items.<p>I think one of the lessons here is that you need to think long and hard about what error conditions you want to expose upstream. The farther you propagate an error condition, the more different ways you can solve it, but the more complicated your clients get. For example, disks sometimes fail. You can decide to not propagate disk failure beyond an individual machine, load up RAID 1 (mirror) in all your file servers, and back everything up. Or you can push disk failures farther up the stack, and recover from failed disks at a higher level, with lower hardware cost but higher implementation complexity. And if you build a bunch of systems assuming that each part is always working, you run very serious risks of cascading failures once one part does inevitably fail.<p>Obviously, small enough systems running on cloud hardware can usually be autoscaled just fine.