TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Conflating the Roles of Alerting and Dashboards

55 pointsby KyleBrandtabout 9 years ago

3 comments

Terr_about 9 years ago
Mistake 3: Putting different people in charge of alert-setting than alert-responding<p>In the best case, it increases the risk of alarm-fatigue. In the pathological cases, it means the technical system becomes a cross-group battleground for shifting blame and workload.
评论 #11226075 未加载
评论 #11226432 未加载
dredmorbiusabout 9 years ago
Good, though I find Rob Ewashuk&#x27;s &quot;My Philosophy on Alerting&quot; essay (Google SRE) better:<p><a href="https:&#x2F;&#x2F;docs.google.com&#x2F;a&#x2F;gravitant.com&#x2F;document&#x2F;d&#x2F;199PqyG3UsyXlwieHaqbGiWVa8eMWi8zzAn0YfcApr8Q&#x2F;preview?sle=true&amp;pli=1#heading=h.fs3knmjt7fjy" rel="nofollow">https:&#x2F;&#x2F;docs.google.com&#x2F;a&#x2F;gravitant.com&#x2F;document&#x2F;d&#x2F;199PqyG3U...</a><p>Particularly:<p>Pages [alerts] should be urgent, important, actionable, and real.<p>They should represent either ongoing or imminent problems with your service.<p>Err on the side of removing noisy alerts – over-monitoring is a harder problem to solve than under-monitoring.<p>You should almost always be able to classify the problem into one of: availability &amp; basic functionality; latency; correctness (completeness, freshness and durability of data); and feature-specific problems.<p>Symptoms are a better way to capture more problems more comprehensively and robustly with less effort.<p>Include cause-based information in symptom-based pages or on dashboards, but avoid alerting directly on causes.<p>The further up your serving stack you go, the more distinct problems you catch in a single rule. But don&#x27;t go so far you can&#x27;t sufficiently distinguish what&#x27;s going on.<p>If you want a quiet oncall rotation, it&#x27;s imperative to have a system for dealing with things that need timely response, but are not imminently critical.
mhbabout 9 years ago
What&#x27;s the distinction between a &quot;contradistinction&quot; and a distinction?
评论 #11226288 未加载
评论 #11226217 未加载
评论 #11226228 未加载