I did a presentation on Bosun at the most recent Monitorama conference: <a href="https://vimeo.com/131581326" rel="nofollow">https://vimeo.com/131581326</a><p>The first ~13 minutes is some of the design thoughts, the why etc. Then I start a demo with some screencasts.
3 years ago I started a company (Stackify) with the hopes of building a better nagios. But it doesn't seem like developers really wanted it. They wanted and needed a lot more as basic server metrics didn't really tell much of a story about application health. The shift to cloud based services also makes a lot of basic monitoring tools unnecessary. Cloud based apps don't really need to monitor servers or infrastructure beyond simple CPU and memory measurement. Developers need to monitor the app itself. Which can really only be done by code profiling, custom metrics, and analyzing errors and log statements.<p>So we since pivoted a little bit and have focused heavily on true application monitoring via basic server metrics, custom app metrics, error tracking, log management, and true APM code profiling. All of this together provides a lot of power when it comes to monitoring and finding application problems.<p>A lot of companies we talk to barely monitor anything about their apps. So many IT teams work in such a reactive mode they aren't very proactive when it comes to monitoring application health and behavior.<p>Would love to get anyone's feedback about this topic. Do you just use basic server monitoring? How detailed do you monitor the actual behavior and health of your application? How do you do it?<p>If you're curious you can check out our product. <a href="http://stackify.com" rel="nofollow">http://stackify.com</a>
I tried it for a while and it sure has potential has one of those modern monitoring systems that are replacing nagios right now.<p>However, in production I would not want to run it in a docker. I would want to setup my own server with option to scale it to remote pollers.<p>In my org we ended up choosing another nagios replacement, but not because of any flaw in bosun.<p>I love iterating over the main points that we look for in a monitoring solution.<p>Self-hosted. Scalable, remote pollers that can plugin to the central servers. Locations, remote pollers can add locations to monitor from. Collector agent that runs periodically from monitored servers instead of the nrpe model that listens to connections. The collector OS agent is windows compatible and backwards compatible with nagios scripts. Monitoring focuses on sending metrics first and foremost, so you can set thresholds for metrics, just like bosun does. And of course, with those metrics the web gui draws fancy graphs for everything.<p>And last but not least, all of this, monitoring agent, pollers, they all use a standard API like REST or xmlrpc.
God it's time someone came up with a good, modern monitoring system. I used Nagios for years but it never evolved past a bunch of CGI scripts written in C(!). I tried Sensu, and was moderately impressed until a major update broke everything and it never worked again.
I've had an intern working to set up Bosun and OpenTSDB on an Ubuntu server, from source, for a few weeks now. He's close, but today is his last day.<p>I'd need to pay someone to professionally set this up for us (so we can easily distribute it with our enterprise software), preferably with just bash scripts. I also need consulting. Like, is it realistic to use a single server for our logging load?<p>I work for a large multi-national. If you're qualified, and interested, we can engage you to help us out. Contact is in my profile.
It seems to be a DSL to describe alerts over whole clusters. That's probably what a monitoring system for the cloud age should. It can monitor Logstash and Graphite, which are proven ways to collect data in a disparate environment.<p>But many in the comments compare it with Nagios which I think isn't really fair. You could probably easily plug this into Nagios and it's dependency rulework can figure out who to page when. Because that's what Nagios is, not the default checks it ships with.