22 点作者 Hansi超过 9 年前

1 comment

encoderer超过 9 年前

This is pretty great stuff. At a more human scale, our saas monitoring tool handles peaks of ~100req/sec that are written to SQS. The daemon that evaluates rules and triggers alerts has a queue health check integrated. It will pause itself when the queue backs up and if the issue persists it sends a page. Features like are just a few lines of code and have helped us squash false positive alerts.

Fail at Scale and Controlling Queue Delay

1 comment

Fail at Scale and Controlling Queue Delay

1 comment