Prometheus: An open-source service monitoring system and time series database

287 pointsby jjwisemanover 10 years ago

21 comments

SEJeffover 10 years ago

As a graphite maintainer, see my other post about problems with graphite:<a href="https://news.ycombinator.com/item?id=8908423" rel="nofollow">https://news.ycombinator.com/item?id=8908423</a>I'm super excited about prometheus, and can't wait to get some time to see if I can make it work on my rasberry pi. That being said, I'm also going to likely eventually work on a graphite-web / graphite-api pluggable backend to use prometheus as the backend storage platform.The more OSS metrics solutions, the better!

评论 #8999389 未加载

评论 #8997858 未加载

评论 #8997604 未加载

ddorian43over 10 years ago

Announced here:<a href="https://developers.soundcloud.com/blog/prometheus-monitoring-at-soundcloud" rel="nofollow">https://developers.soundcloud.com/blog/prometheus-monitoring...</a>

评论 #8995889 未加载

ggambettaover 10 years ago

"Those who cannot remember the Borgmon are doomed to repeat it" ;)Just kidding, this is looking really good, I hope to get some hands-on experience with it soon.

评论 #8998251 未加载

评论 #8995904 未加载

评论 #8995901 未加载

评论 #8998152 未加载

clarkevansover 10 years ago

We've been looking for something like this, unfortunately the "pull" model won't work for us. We really need a push model so that our statistics server doesn't need access to every single producer. I see the pushgateway, but it seems deliberately not a centralized storage.I wonder what InfluxDB means by "distributed", that is, if I could use it to implement a push (where distributed agents push to a centralized metric server) model.

评论 #8996209 未加载

评论 #8996079 未加载

评论 #8996046 未加载

评论 #8997583 未加载

评论 #8996528 未加载

评论 #8996647 未加载

mbellover 10 years ago

From the storage system docs:> which organizes sample data in chunks of constant size (1024 bytes payload). These chunks are then stored on disk in one file per time series.That is concerning, is this going to have the same problem with disk IO that graphite does? i.e. Every metric update requires a disk IO due to this one file per metric structure.

评论 #8996918 未加载

评论 #8996872 未加载

perlgeekover 10 years ago

This looks very interesting.From <a href="http://prometheus.io/docs/introduction/getting_started/" rel="nofollow">http://prometheus.io/docs/introduction/getting_started/</a>> Prometheus collects metrics from monitored targets by scraping metrics HTTP endpoints on these targets.I wonder if we'll see some plugins that allow data collection via snmp or nagios monitoring scripts or so. That would make it much easier to switch large existing monitoring systems over to prometheus.

评论 #8995875 未加载

评论 #8995862 未加载

评论 #8996457 未加载

bhugaover 10 years ago

It's great to see new entrants into the monitoring and graphing space. These are problems that every company has, and yet there's no solution as widely accepted for monitoring, notifications or graphing as nginx is for a web server.Not that I'd do a better job, but every time I further configure our monitoring system, I get that feeling that we're missing something as an industry. It's a space with lots of tools that feel too big or too small; only graphite feels like it's doing one job fairly well.Alerting is the worst of it. Nagios and all the other alerting solutions I've played with feel just a bit off. They're either doing too much or carve out a role at boundaries that aren't quite right. This results in other systems wanting to do alerting, making it tough to compare tools.As an example, Prometheus has an alert manager under development: <a href="https://github.com/prometheus/alertmanager" rel="nofollow">https://github.com/prometheus/alertmanager</a>. Why isn't doing a great job at graphing enough of a goal? Is it a problem with the alerting tools, or is it a problem with boundaries between alerting, graphing, and notifications?

评论 #8996898 未加载

评论 #8998379 未加载

评论 #8996966 未加载

e12eover 10 years ago

So... how does this compare to <a href="http://riemann.io/" rel="nofollow">http://riemann.io/</a> ? I just re-discovered riemann... and was thinking of pairing it with logstash and have a go. It would seem prometheus does something... similar?

评论 #8995974 未加载

评论 #8996695 未加载

simple10over 10 years ago

Looks really promising for smaller clusters. However, the pull/scraping model for stats could be problematic for larger scale.I've been experimenting with metrics collection using heka (node) -> amqp -> heka (aggregator) -> influxdb -> grafana. It works extremely well and scales nicely but requires writing lua code for anomaly detection and alerts – good or bad depending on your preference.I highly recommend considering Heka[1] for shipping logs to both ElasticSearch and InfluxDB if you need more scale and flexibility than Prometheus currently provides.[1] <a href="https://github.com/mozilla-services/heka" rel="nofollow">https://github.com/mozilla-services/heka</a>

评论 #8998279 未加载

0xdeadbeefbabeover 10 years ago

While monitoring is obviously useful, I'm not understanding the obvious importance of a time series database. Can you collect enough measurements for the time series database to be useful? I worry that I would have lots of metrics to backup my wrong conclusions. I also worry that so much irrelevant data would drown out the relevant stuff, and cause the humans to ignore the system in time. I work with computers and servers, and not airplanes or trains.

评论 #8999850 未加载

zeus13iover 10 years ago

After reading this thread and comparing Influx and Prometheus, I've concluded that both look promising. I was going to go with Prometheus (as it's easier to get started with), but I was really put off by the 'promdash' dashboard - it uses iframes and depends on mysql. So I'm going with InfluxDB + Grafana and I'll keep an eye out for developments.

评论 #9028016 未加载

评论 #9027980 未加载

kanwisherover 10 years ago

Would be interesting how this compares to InfluxDb

评论 #8995843 未加载

mhaxover 10 years ago

I'm a little wary of a monolithic solutions to monitoring/graphing/time series data storage - it gives me flashbacks of nagios/zabbix ;)I currently use a combination of sensu/graphite/grafana which allows a lot of flexability (albeit with some initial wrangling with the setup)

评论 #8996643 未加载

tincoover 10 years ago

In your architecture I see a single monolithic database server called 'Prometheus'. Does it shard? I can't find it in the documentation. You mention it's compatible with TSDB, why did you choose to implement your own backend, or is this a fork of TSDB?The tech does look awesome though!

评论 #8996444 未加载

评论 #8996429 未加载

secureover 10 years ago

I used to use InfluxDB + a custom program to scrape HTTP endpoints and insert them into InfluxDB before.After playing around with Prometheus for a day or so, I’m convinced I need to switch to Prometheus :). The query language is so much better than what InfluxDB and others provide.

评论 #8996219 未加载

paulasmuthover 10 years ago

Shameless plug: This looks quite similar to FnordMetric, which also supports labels/multi dimensional time series, is StatsD wire compatible and supports SQL as a query language (so you won't have to learn yet another DSL)

xfalcoxover 10 years ago

Guys, I've seen the libs for collecting services info, but how do I get OS level info, like load average, disk utilization, ram etc..?I suppose that there's a simple service that we need to deploy on each server?Any tips on this use case?

评论 #8996773 未加载

reinhardt1053over 10 years ago

Promdash, the dashboard builder for Prometheus, is written in Ruby on Rails <a href="https://github.com/prometheus/promdash" rel="nofollow">https://github.com/prometheus/promdash</a>

corfordover 10 years ago

This looks great! Is an official Python client library on the roadmap?

评论 #8996013 未加载

XorNotover 10 years ago

Huh, how fortuitous. I've been looking for this exact type of thing and HN gives me a great starting place to evaluate.

rgjover 10 years ago

Is it me or is it impossible to navigate the documentation on an iPad?

评论 #9000365 未加载

评论 #9000426 未加载