TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Your nines are not my nines (2019)

106 pointsby thewarpaintover 1 year ago

8 comments

sjsdaiuasgdiaover 1 year ago
This is a concept I've had to explain to entirely too many teams over the years, that 0.001% of requests failing as a (mostly) random distribution of all requests is very different than a 0.001% subset of requests that will fail (nearly) every time until the underlying issue is mitigated. They look the same on a high level dashboard but they are completely different conditions in terms of how the customer will feel it, and understanding which kind of problem you have also guides the investigation and troubleshooting process.
评论 #37679809 未加载
RajT88over 1 year ago
The way it works with cloud providers is - you can file for a refund for SLA breach. After all - those SLA&#x27;s are at a service level for the customer. If you&#x27;re yelling at support or engineering on the phone, you&#x27;re likely getting the 9&#x27;s treatment the author describes - this is the wrong forum to hold the provider accountable unless you&#x27;re yelling about mitigation time (then, best of luck to you!).<p>Reading the fine print on the SLA&#x27;s is extremely important, because they often do not say what you think they say.<p><a href="https:&#x2F;&#x2F;aws.amazon.com&#x2F;legal&#x2F;service-level-agreements&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;aws.amazon.com&#x2F;legal&#x2F;service-level-agreements&#x2F;</a> <a href="https:&#x2F;&#x2F;www.microsoft.com&#x2F;licensing&#x2F;docs&#x2F;view&#x2F;Service-Level-Agreements-SLA-for-Online-Services" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.microsoft.com&#x2F;licensing&#x2F;docs&#x2F;view&#x2F;Service-Level-...</a> <a href="https:&#x2F;&#x2F;cloud.google.com&#x2F;terms&#x2F;sla&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;cloud.google.com&#x2F;terms&#x2F;sla&#x2F;</a><p>I have seen refunds on the order of hundreds of thousands of dollars. It&#x27;s cold comfort if the impact to you was on the order of millions of dollars, but still it is something. As you can see it&#x27;s not a free-money-a-thon, it&#x27;s generally a % of your spend of the services which were not available.<p>There typically is a defined process for submitting a refund ticket, which will result in an availability review. This documented process is not always easy to find.<p>The only one I could easily find is for Microsoft:<p><a href="https:&#x2F;&#x2F;learn.microsoft.com&#x2F;en-us&#x2F;partner-center&#x2F;request-credit#service-outages-service-level-agreement-issues-credit" rel="nofollow noreferrer">https:&#x2F;&#x2F;learn.microsoft.com&#x2F;en-us&#x2F;partner-center&#x2F;request-cre...</a><p>(It&#x27;s just a support topic when you&#x27;re submitting a support ticket)
评论 #37680639 未加载
Animatsover 1 year ago
<i>&quot;You are the bug on the windscreen of the locomotive. The train has no idea you were ever there.&quot;</i> - Rachel by the Bay.<p>That&#x27;s how monopolies work. They need not fear their customers.<p>In time, this becomes Orwell&#x27;s &quot;If you want a vision of the future, imagine a boot stamping on a human face – forever.&quot; Ask anyone who&#x27;s had a dispute with the Apple app store.
hughesjjover 1 year ago
Hot take:<p>I would love to have service providers show their (down sampled!) Alarms actually used for operational excellence publicly (from a read replica&#x2F;etc)<p>Doing so would enforce that you actually have those in place, since they&#x27;re public and now a marketing point. That said, I get the concern of trolls and competitors trying to get a &quot;low score&quot;.
评论 #37678240 未加载
评论 #37680981 未加载
评论 #37678688 未加载
评论 #37678144 未加载
hinkleyover 1 year ago
There&#x27;s an old joke that goes something like, &quot;Most of the people chasing five nines uptime achieved five eights.&quot;
评论 #37686486 未加载
评论 #37683610 未加载
thegrim33over 1 year ago
Sure, there&#x27;s the issue of what your contract says and what the guarantee is, but all these companies do already track their metrics in ways that at least attempt to detect and respond to the problems the author describes.<p>They track their metrics by p50 (the average performance&#x2F;reliability for everyone) but also by p99, p99.9, etc., which is the performance&#x2F;reliability for the extreme edge cases, such as exactly what the author is describing. They already do evaluate their systems from the perspective of how it&#x27;s performing for the worst affected customers. Again, maybe the issue is the contract itself, sure, but they do already try their best to prevent a small handful of customers from getting overly affected by something.
评论 #37689519 未加载
评论 #37681106 未加载
评论 #37680758 未加载
teklaover 1 year ago
I dont really get why Cloud matters here. The exact same dynamic exists for on-prem services.
评论 #37677806 未加载
评论 #37678488 未加载
评论 #37677800 未加载
评论 #37682877 未加载
ChrisArchitectover 1 year ago
(2019)
评论 #37675469 未加载