Grafana truly is best in class, but I have strong reservations about Prometheus.<p>I really want to like it, it’s just so _easy_, publish a little webpage with your metrics and Prometheus takes care of the rest. Lovely.<p>But I often find that the cardinality of the data is substantially lower than even the defaults of alternatives (influxdb has 1s and even Zabbix has 5s).<p>Not to mention the lost writes (missing data points) which have no logged explanation.<p>All of this, however, was in my homelab, which, while unconstrained in resources lacks a lot of the fit and finish of a prod system.<p>I also take pause with the architecture; it’s not meant to scale. It’s written on the tin so it’s not like I’m picking fault, but when you’re building a dashboard that sucks in data from 25 different Prometheus data sources, it becomes difficult to run functions like SUM(), because the keys may be out of sync causing some really ugly and inaccurate representations of data.<p>Everything about the design (polling, single database) tells me that it was designed primarily to sit alongside something small. It could never handle the tens of millions of data points per second that I ingest(ed) at my (now previous) job.<p>But it has a lot of hype, and maybe I’m holding it wrong.
I have a love/hate relationship with Prometheus. If I had no budget for metrics its likely the thing I would reach for, but I’m dying for someone to open source a ‘next level’ metrics system (something like Monarch or Circonus but free).<p>But woe betide the team that has to run it as a service. Not that other metrics systems are better but Prometheus can be brutal in that space.<p>As a ‘squad level’ tool its really good. After that it gets hairy fast.
We've got a somewhat similar landscape, on a pretty sizeable network - big investment in Zabbix and looking to move, perhaps slowly and perhaps only in part, towards Prometheus.<p>Coming from a monitoring system that supports push and pull with elegant auto-discovery, we're struggling to work out a sane architecture around (effectively pull-only) Prometheus.
Interesting post. However, I believe that most content, and especially broad technical one like this, absolutely needs a balanced amount of relevant visual elements (e.g., images, diagrams). If you want it to be readable, that is.
I thought this article, while a little dry, was very illuminating. It sounds Hyperfeed is running at the very least "Medium Data" (we all thing our Data is Big!). And i think it is fascinating to hear of a case where Prometheus is plainly a bad fit for it's intend purpose. It sounds like cardinality explosion around their ML models was a really bad fit for Prometheus. Its great to hear about deployments "in-situ", and people appreciating where it works well, and where it doesn't.
What's a good alternative to Prometheus when pulling stats is impractical? Say I want to monitor a personal laptop like I would a server. It will change networks and IP addresses, so pulling would be impractical to configure, whereas the laptop could easily(?) push its stats to a remote server.
What do you all do with the collected metrics over time? Do you store everything forever, drop everything after a couple weeks, or something on between? I've heard of people thinning out old data a bit (?) and storing it long term rather than storing everything. What's the usual thing people do?
Prometheus is great. I first heard about it at KubeCon last fall, and kind of shrugged it off as one of those fledgling "cloud native" projects that I probably didn't need or didn't have time to learn. There's actually a lot of adoption, you can find great exporters and grafana dashboards for almost any OSS you're running today. I started collecting metrics from Zookeeper and HBase in about an hour, having never had access to that telemetry before. From the existence of Cortex[1], it seems that Prometheus doesn't scale incredibly well, but I don't think many users will hit these limits.<p>[1] <a href="https://cortexmetrics.io/" rel="nofollow">https://cortexmetrics.io/</a>
Prometheus and Grafana are awesome, use them personally for all my monitoring.<p>However I’m still trying to nail down my high cardinality/highly unique metrics-like data story. What are people using?<p>I’ve heard a combination of Cassandra/BigTable and Spark as a potential solution?
I've been looking into Prometheus + Grafana for other reasons. I have some 3rd party APIs connected through API gateway, which I need to health check and I couldn't find other open source alternatives. Gonna move the whole setup to cloud at some point but I'm not sure if this is the right thing to do. Does anyone have other articles/ open source tools which can be helpful to me? This article goes much deeper into how the setup can be used but I'm looking for more simpler use cases of the same setup, for the task I need to do.
Does anyone have anything good or bad to share about using Grafana as a front end for metrics logged in AWS cloudwatch? I know it has a plug in and I'm fed up with how bad the cloudwatch dashboards are so wondering if I should check it out.