TechEcho

2 comments

rewmieover 1 year ago

From the article:> Our services now ran on Kubernetes with individual pods collecting tens-of-thousands of metric samples per second. Engineers also evolved much stricter expectations for the amount of dropped metric samples (none).Is there any indication of how many datagrams can be dropped in a localhost connection? I expect zero datagrams to be dropped between a service and it's sidecar.It's also possible to run a StatsD node in a Kubernetes cluster as a DaemonSet. Each service+StatsD sidecar runs as a Deployment and StatsD sidecars can be configured to both aggregate metrics and push them to the StatsD DaemonSet through a TCP connection. This would lead to very low traffic rates between each Deployment and the DaemonSet, as sidecars can push their metrics in periods of several seconds or even minutes, and each Kubernetes node typically runs single digit/low two digit Deployments which results in single-digit TPS.Does anyone with experience in both Prometheus and StatsD have any insight into this scenario? At first glance it sounds like the premise to push a migration to Prometheus is flawed and unjustified.

matthewtseover 1 year ago

It's interesting to me how Prometheus came to dominate the metrics ecosystem over the past decade.When I first encountered it, I had a visceral reaction against it, primary over: 1) The PromQL query language is really not intuitive. I've been using it for years and I still stick to really simple joins to keep myself out of hot water 2) Being forced to use rate() on counter metrics was a really big initial barrier against one of the presumably simpler things a metrics system should handle. See this popular long blog post explaining what Prometheus Counters are: <a href="https://www.robustperception.io/how-does-a-prometheus-counter-work/" rel="nofollow noreferrer">https://www.robustperception.io/how-does-a-prometheus-counte...</a>But complicated rate() and counters were a symptom of the killer feature about Prometheus, that it was a pull-based rather than push-based model. This meant you could easily re-deploy subcomponents of your metrics system, getting rid of the hair-raising stop-the-world updates that we used to have to do with graphite.

2 comments

rewmieover 1 year ago

matthewtseover 1 year ago

We Migrated from StatsD to Prometheus in One Month

2 comments

We Migrated from StatsD to Prometheus in One Month

2 comments