Glad to see writeups and lessons learned about AutoScaling. As a system builder, I think it's a capability that's under-appreciated. My rule of thumb is that essentially every machine fleet benefits from being managed as an AutoScaling Group (whether stateless or not - there's value even without dynamic scaling).<p>> If you’re working with a queue based model then scaling will be done based on the SQS queue size, otherwise we’ll use the custom metric “number of running jobs”.<p>There is another strategy to consider when auto-scaling queue-based applications: if you have a fixed number of threads per machine instance that process work from a queue, then you can scale based on the percentage of threads occupied.<p>You can treat this as a utilization percentage similar to CPU Utilization, the observation being that you can begin scaling up in advance of ever actually developing a (long) queue. For example, consider an application where the average queue size is zero, and the average thread utilization is zero. Consider another application where the average queue size is zero, because messages are processed almost immediately, but 19 out of 20 threads are occupied on average. You can conclude that the first application is nearly idle, while the second is nearly maxed out, even though both of them have empty queues. By considering the second application to be (19/20=) 95% utilized, you can establish a scaling policy that scales up before a backlog develops if you wish -- this is assuming that you wish to avoid developing a long queue, which is desirable in some cases. It depends on how quickly you'd like to process messages - SLA.<p>(The article touches on this as well, talking about number of running jobs.)<p>Queue size can be useful as well, but I think it can be more difficult to tune. Percent of thread capacity works well regardless of how large your fleet is, and how expensive messages are to process. By comparison, a large-scale system that processes thousands of messages per second could develop a large queue -- in terms of number of items -- from a brief blip, which it will burn through momentarily. A 20,000 item queue might be nothing to such a system, whereas for another system, 100 items could be significant, if each one is a 10GB video to download and transcode.<p>The ideal auto-scaling solution typically involves a mix of multiple measurement techniques, since it's rare that any single performance characteristic captures an application's load perfectly. I would definitely agree with the author's point that instrumentation is highly valuable.