Seems like the world of cloud server monitoring is a big cluster right now, and there are no elegant packages out there that do everything.<p>Servers/services to monitor:
Amazon EC2 and RDS, Ruby on Rails web app Redis + Resque workers, our own Mac servers<p>Alert stages: Minor, where an unobtrusive message is sent to operations. Severe, where someone gets a phone call<p>Current solution: New Relic to monitor Ruby on Rails, Amazon EC2 instances, Amazon RDS, Redis, and Resque. Collectd + Riemann + Librato Metrics to monitor our own servers. Hipchat for minor alerting. PagerDuty for severe alerting<p>The biggest issue I'm having now, is that alerting in New Relic for anything other than Rails and Servers (RDS, Redis, and Resque) is bad. I'm asking HN, how do you monitor your cloud server and services, and how do you provide alerting to staff when something goes wrong?