TechEcho

3 comments

Terr_about 9 years ago

Mistake 3: Putting different people in charge of alert-setting than alert-respondingIn the best case, it increases the risk of alarm-fatigue. In the pathological cases, it means the technical system becomes a cross-group battleground for shifting blame and workload.

评论 #11226075 未加载

评论 #11226432 未加载

dredmorbiusabout 9 years ago

Good, though I find Rob Ewashuk's "My Philosophy on Alerting" essay (Google SRE) better:<a href="https://docs.google.com/a/gravitant.com/document/d/199PqyG3UsyXlwieHaqbGiWVa8eMWi8zzAn0YfcApr8Q/preview?sle=true&pli=1#heading=h.fs3knmjt7fjy" rel="nofollow">https://docs.google.com/a/gravitant.com/document/d/199PqyG3U...</a>Particularly:Pages [alerts] should be urgent, important, actionable, and real.They should represent either ongoing or imminent problems with your service.Err on the side of removing noisy alerts – over-monitoring is a harder problem to solve than under-monitoring.You should almost always be able to classify the problem into one of: availability & basic functionality; latency; correctness (completeness, freshness and durability of data); and feature-specific problems.Symptoms are a better way to capture more problems more comprehensively and robustly with less effort.Include cause-based information in symptom-based pages or on dashboards, but avoid alerting directly on causes.The further up your serving stack you go, the more distinct problems you catch in a single rule. But don't go so far you can't sufficiently distinguish what's going on.If you want a quiet oncall rotation, it's imperative to have a system for dealing with things that need timely response, but are not imminently critical.

mhbabout 9 years ago

What's the distinction between a "contradistinction" and a distinction?

评论 #11226288 未加载

评论 #11226217 未加载

评论 #11226228 未加载

3 comments

Terr_about 9 years ago

评论 #11226075 未加载

评论 #11226432 未加载

dredmorbiusabout 9 years ago

mhbabout 9 years ago

What's the distinction between a "contradistinction" and a distinction?

评论 #11226288 未加载

评论 #11226217 未加载

评论 #11226228 未加载

Conflating the Roles of Alerting and Dashboards

3 comments

Conflating the Roles of Alerting and Dashboards

3 comments