As a graphite maintainer, see my other post about problems with graphite:<p><a href="https://news.ycombinator.com/item?id=8908423" rel="nofollow">https://news.ycombinator.com/item?id=8908423</a><p>I'm super excited about prometheus, and can't wait to get some time to see if I can make it work on my rasberry pi. That being said, I'm also going to likely eventually work on a graphite-web / graphite-api pluggable backend to use prometheus as the backend storage platform.<p>The more OSS metrics solutions, the better!
Announced here:<p><a href="https://developers.soundcloud.com/blog/prometheus-monitoring-at-soundcloud" rel="nofollow">https://developers.soundcloud.com/blog/prometheus-monitoring...</a>
"Those who cannot remember the Borgmon are doomed to repeat it" ;)<p>Just kidding, this is looking really good, I hope to get some hands-on experience with it soon.
We've been looking for something like this, unfortunately the "pull" model won't work for us. We really need a push model so that our statistics server doesn't need access to every single producer. I see the pushgateway, but it seems deliberately not a centralized storage.<p>I wonder what InfluxDB means by "distributed", that is, if I could use it to implement a push (where distributed agents push to a centralized metric server) model.
From the storage system docs:<p>> which organizes sample data in chunks of constant size (1024 bytes payload). These chunks are then stored on disk in one file per time series.<p>That is concerning, is this going to have the same problem with disk IO that graphite does? i.e. Every metric update requires a disk IO due to this one file per metric structure.
This looks very interesting.<p>From <a href="http://prometheus.io/docs/introduction/getting_started/" rel="nofollow">http://prometheus.io/docs/introduction/getting_started/</a><p>> Prometheus collects metrics from monitored targets by scraping metrics HTTP endpoints on these targets.<p>I wonder if we'll see some plugins that allow data collection via snmp or nagios monitoring scripts or so. That would make it much easier to switch large existing monitoring systems over to prometheus.
It's great to see new entrants into the monitoring and graphing space. These are problems that every company has, and yet there's no solution as widely accepted for monitoring, notifications or graphing as nginx is for a web server.<p>Not that I'd do a better job, but every time I further configure our monitoring system, I get that feeling that we're missing something as an industry. It's a space with lots of tools that feel too big or too small; only graphite feels like it's doing one job fairly well.<p>Alerting is the worst of it. Nagios and all the other alerting solutions I've played with feel just a bit off. They're either doing too much or carve out a role at boundaries that aren't quite right. This results in other systems wanting to do alerting, making it tough to compare tools.<p>As an example, Prometheus has an alert manager under development: <a href="https://github.com/prometheus/alertmanager" rel="nofollow">https://github.com/prometheus/alertmanager</a>. Why isn't doing a great job at graphing enough of a goal? Is it a problem with the alerting tools, or is it a problem with boundaries between alerting, graphing, and notifications?
So... how does this compare to <a href="http://riemann.io/" rel="nofollow">http://riemann.io/</a> ? I just re-discovered riemann... and was thinking of pairing it with logstash and have a go. It would seem prometheus does something... similar?
Looks really promising for smaller clusters. However, the pull/scraping model for stats could be problematic for larger scale.<p>I've been experimenting with metrics collection using heka (node) -> amqp -> heka (aggregator) -> influxdb -> grafana. It works extremely well and scales nicely but requires writing lua code for anomaly detection and alerts – good or bad depending on your preference.<p>I highly recommend considering Heka[1] for shipping logs to both ElasticSearch and InfluxDB if you need more scale and flexibility than Prometheus currently provides.<p>[1] <a href="https://github.com/mozilla-services/heka" rel="nofollow">https://github.com/mozilla-services/heka</a>
While monitoring is obviously useful, I'm not understanding the obvious importance of a time series database. Can you collect enough measurements for the time series database to be useful? I worry that I would have lots of metrics to backup my wrong conclusions. I also worry that so much irrelevant data would drown out the relevant stuff, and cause the humans to ignore the system in time. I work with computers and servers, and not airplanes or trains.
After reading this thread and comparing Influx and Prometheus, I've concluded that both look promising.
I was going to go with Prometheus (as it's easier to get started with), but I was really put off by the 'promdash' dashboard - it uses iframes and depends on mysql.
So I'm going with InfluxDB + Grafana and I'll keep an eye out for developments.
I'm a little wary of a monolithic solutions to monitoring/graphing/time series data storage - it gives me flashbacks of nagios/zabbix ;)<p>I currently use a combination of sensu/graphite/grafana which allows a lot of flexability (albeit with some initial wrangling with the setup)
In your architecture I see a single monolithic database server called 'Prometheus'. Does it shard? I can't find it in the documentation. You mention it's compatible with TSDB, why did you choose to implement your own backend, or is this a fork of TSDB?<p>The tech does look awesome though!
I used to use InfluxDB + a custom program to scrape HTTP endpoints and insert them into InfluxDB before.<p>After playing around with Prometheus for a day or so, I’m convinced I need to switch to Prometheus :). The query language is so much better than what InfluxDB and others provide.
Shameless plug: This looks quite similar to FnordMetric, which also supports labels/multi dimensional time series, is StatsD wire compatible and supports SQL as a query language (so you won't have to learn yet another DSL)
Guys, I've seen the libs for collecting services info, but how do I get OS level info, like load average, disk utilization, ram etc..?<p>I suppose that there's a simple service that we need to deploy on each server?<p>Any tips on this use case?
Promdash, the dashboard builder for Prometheus, is written in Ruby on Rails <a href="https://github.com/prometheus/promdash" rel="nofollow">https://github.com/prometheus/promdash</a>