We plan to do a blog post about this at some point, but we had the pleasure of seeing exactly how elastic the elb is when we switched Cronitor from linode to aws in February 2015. Requisite backstory: Our api traffic comes from jobs, daemons, etc, which tend to create huge hot spots at tops of each minute, quarter hour, hour and midnight of popular tz offsets like UTC, us eastern, etc. There is an emergent behavior to stacking these up and we hit peak traffic many many times our resting baseline. At the time, our median ping traffic was around 8 requests per second, with peaks around 25x that.<p>What's unfortunate is that in the first day after setting up the elb we didn't have problems, but soon after we started getting reports of intermittent downtime. On our end our metrics looked clean. The elb queue never backed up seriously according to cloud watch. But when we started running our own healthchecks against the elb we saw what our customers had been reporting: in the crush of traffic at the top of the hour connections to the elb were rejected despite the metrics never indicating a problem.<p>Once we saw the problem ourselves it seemed easy to understand. Amazon is provisioning that load balancer <i>elastically</i> and our traffic was more power law than normal distribution. We didn't have high enough baseline traffic to earn enough resources to service peak load. So, cautionary tale of dont just trust the instruments in the tin when it comes to cloud iaas -- you need your own. It's understandable that we ran into a product limitation, but unfortunate that we were not given enough visibility to see the obvious problem without our own testing rig.