Systems Monitoring with Prometheus and Grafana

198 pointsby jsulakalmost 5 years ago

11 comments

dijitalmost 5 years ago

Grafana truly is best in class, but I have strong reservations about Prometheus.I really want to like it, it’s just so _easy_, publish a little webpage with your metrics and Prometheus takes care of the rest. Lovely.But I often find that the cardinality of the data is substantially lower than even the defaults of alternatives (influxdb has 1s and even Zabbix has 5s).Not to mention the lost writes (missing data points) which have no logged explanation.All of this, however, was in my homelab, which, while unconstrained in resources lacks a lot of the fit and finish of a prod system.I also take pause with the architecture; it’s not meant to scale. It’s written on the tin so it’s not like I’m picking fault, but when you’re building a dashboard that sucks in data from 25 different Prometheus data sources, it becomes difficult to run functions like SUM(), because the keys may be out of sync causing some really ugly and inaccurate representations of data.Everything about the design (polling, single database) tells me that it was designed primarily to sit alongside something small. It could never handle the tens of millions of data points per second that I ingest(ed) at my (now previous) job.But it has a lot of hype, and maybe I’m holding it wrong.

评论 #24127427 未加载

评论 #24127861 未加载

评论 #24129090 未加载

评论 #24127278 未加载

评论 #24130013 未加载

评论 #24134711 未加载

kasey_junkalmost 5 years ago

I have a love/hate relationship with Prometheus. If I had no budget for metrics its likely the thing I would reach for, but I’m dying for someone to open source a ‘next level’ metrics system (something like Monarch or Circonus but free).But woe betide the team that has to run it as a service. Not that other metrics systems are better but Prometheus can be brutal in that space.As a ‘squad level’ tool its really good. After that it gets hairy fast.

评论 #24128823 未加载

评论 #24133738 未加载

评论 #24127432 未加载

评论 #24129080 未加载

Jeddalmost 5 years ago

We've got a somewhat similar landscape, on a pretty sizeable network - big investment in Zabbix and looking to move, perhaps slowly and perhaps only in part, towards Prometheus.Coming from a monitoring system that supports push and pull with elegant auto-discovery, we're struggling to work out a sane architecture around (effectively pull-only) Prometheus.

评论 #24127670 未加载

评论 #24130848 未加载

评论 #24133768 未加载

ablekhalmost 5 years ago

Interesting post. However, I believe that most content, and especially broad technical one like this, absolutely needs a balanced amount of relevant visual elements (e.g., images, diagrams). If you want it to be readable, that is.

评论 #24128916 未加载

djmetzlealmost 5 years ago

I thought this article, while a little dry, was very illuminating. It sounds Hyperfeed is running at the very least "Medium Data" (we all thing our Data is Big!). And i think it is fascinating to hear of a case where Prometheus is plainly a bad fit for it's intend purpose. It sounds like cardinality explosion around their ML models was a really bad fit for Prometheus. Its great to hear about deployments "in-situ", and people appreciating where it works well, and where it doesn't.

bacondude3almost 5 years ago

What's a good alternative to Prometheus when pulling stats is impractical? Say I want to monitor a personal laptop like I would a server. It will change networks and IP addresses, so pulling would be impractical to configure, whereas the laptop could easily(?) push its stats to a remote server.

评论 #24129644 未加载

评论 #24129839 未加载

评论 #24133831 未加载

评论 #24128198 未加载

评论 #24128492 未加载

评论 #24128125 未加载

评论 #24138165 未加载

site-packages1almost 5 years ago

What do you all do with the collected metrics over time? Do you store everything forever, drop everything after a couple weeks, or something on between? I've heard of people thinning out old data a bit (?) and storing it long term rather than storing everything. What's the usual thing people do?

评论 #24127166 未加载

评论 #24129917 未加载

评论 #24127872 未加载

评论 #24127194 未加载

cmcknalmost 5 years ago

Prometheus is great. I first heard about it at KubeCon last fall, and kind of shrugged it off as one of those fledgling "cloud native" projects that I probably didn't need or didn't have time to learn. There's actually a lot of adoption, you can find great exporters and grafana dashboards for almost any OSS you're running today. I started collecting metrics from Zookeeper and HBase in about an hour, having never had access to that telemetry before. From the existence of Cortex[1], it seems that Prometheus doesn't scale incredibly well, but I don't think many users will hit these limits.[1] <a href="https://cortexmetrics.io/" rel="nofollow">https://cortexmetrics.io/</a>

评论 #24126815 未加载

评论 #24126852 未加载

评论 #24128430 未加载

halfmatthalfcatalmost 5 years ago

Prometheus and Grafana are awesome, use them personally for all my monitoring.However I’m still trying to nail down my high cardinality/highly unique metrics-like data story. What are people using?I’ve heard a combination of Cassandra/BigTable and Spark as a potential solution?

评论 #24127111 未加载

评论 #24127023 未加载

评论 #24127867 未加载

评论 #24129608 未加载

评论 #24127492 未加载

评论 #24126769 未加载

评论 #24126891 未加载

apihealthalmost 5 years ago

I've been looking into Prometheus + Grafana for other reasons. I have some 3rd party APIs connected through API gateway, which I need to health check and I couldn't find other open source alternatives. Gonna move the whole setup to cloud at some point but I'm not sure if this is the right thing to do. Does anyone have other articles/ open source tools which can be helpful to me? This article goes much deeper into how the setup can be used but I'm looking for more simpler use cases of the same setup, for the task I need to do.

评论 #24133924 未加载

osn9363739almost 5 years ago

Does anyone have anything good or bad to share about using Grafana as a front end for metrics logged in AWS cloudwatch? I know it has a plug in and I'm fed up with how bad the cloudwatch dashboards are so wondering if I should check it out.

评论 #24129336 未加载