I've skimmed some of the literature here when I've spent time trying to help people with their bucket boundaries for Prometheus-style instrumentation of things denominated in "seconds", such as processing time and freshness.<p>My use case is a little different from what's described here or in a lot of the literature. Some of the differences:<p>(1) You have to pre-decide on bucket values, often hardcoded or stored in code-like places, and realistically won't bother to update them often unless the data look unusably noisy.<p>(2) Your maximum number of buckets is pretty small -- like, no more than 10 or 15 histogram buckets probably. This is because my metrics are very high cardinality (my times get recorded alongside other dimensions that may have 5-100 distinct values, things like server instance number, method name, client name, or response status).<p>(3) I think I know what percentiles I care about -- I'm particularly interested in minimizing error for, say, p50, p95, p99, p999 values and don't care too much about others.<p>(4) I think I know what values I care about knowing precisely! Sometimes people call my metrics "SLIs" and sometimes they even set an "SLO" which says, say, I want no more than 0.1% of interactions to take more than 500ms. (Yes, those people say, we have accepted that this means that 0.1% of people may have an unbounded bad experience.) So, okay, fine, let's force a bucket boundary at 500ms and then we'll always be measuring that SLO with no error.<p>(5) I know that the test data I use as input don't always reflect how the system will behave over time. For example I might feed my bucket-designing algorithm yesterday's freshness data and that might have been a day when our async data processing pipeline was never more than 10 minutes backlogged. But in fact in the real world every few months we get a >8 hour backlog and it turns out we'd like to be able to accurately measure the p99 age of processed messages even if they are very old... So despite our very limited bucket budget we probably do want some buckets at 1, 2, 4, 8, 16 hours, even if at design time they seem useless.<p>I have always ended up hand-writing my own error approximation function which takes as input like<p>(1) sample data - a representative subset of the actual times observed in my system yesterday<p>(2) proposed buckets - a bundle of, say, 15 bucket boundaries<p>(3) percentiles I care about<p>then returns as output info about how far off (%age error) each estimated percentile is from the actual value for my sample data.<p>Last time I looked at this I tried using libraries that purport to compute very good bucket boundaries but they give me, like, 1500 buckets with very nice tiny error, but no clear way to make real-world choice about collapsing this into a much smaller set of buckets with comparatively huge, but manageable, error.<p>I ended up just advising people to<p>* set bucket boundaries at SLO boundaries, and be sure to update when the SLO does<p>* actually look at your data and understand the data's shape<p>* minimize error for the data set you have now; logarithmic bucket sizes with extra buckets near the distribution's current median value seems to work well<p>* minimize worst-case error if the things you're measuring grow very small or very large and you care about being able to observe that (add extra buckets)