TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Retries – An interactive study of request retry methods

251 pointsby whenlamboover 1 year ago

15 comments

lclarkmichalekover 1 year ago
This still isn&#x27;t what I&#x27;d call &quot;safe&quot;. Retries are amazing at supporting clients in handling temporary issues, but horrible for helping them deal with consistently overloaded servers. While jitter &amp; exponential backoff help with the timing, they don&#x27;t reduce the overall load sent to the service.<p>The next step is usually local circuit breakers. The two easiest to implement are terminating the request if the error rate to the service over the last &lt;window&gt; is greater than x%, and terminating the request (or disabling retries) if the % of requests that are retries over the last &lt;window&gt; is greater than x%.<p>i.e. don&#x27;t bother sending a request if 70% of requests have errored in the last minute, and don&#x27;t bother retrying if 50% of the requests we&#x27;ve sent in the last minute have already been retries.<p>Google SRE book describes lots of other basic techniques to make retries safe.
评论 #38398016 未加载
评论 #38393177 未加载
tyingqover 1 year ago
This is one of those things that sort of exposes our industry maturity versus other engineering that&#x27;s been around longer. You would think by now that the various frameworks for remote calls would have standardized down to include the best practice retry patterns, with standard names, setting ranges, etc. But we mostly still roll our own for most languages&#x2F;frameworks. And that&#x27;s full of footguns around DNS caching, when&#x2F;how to retry on certain failures (unauthorized, for example), and so on.<p>(Yes, there should also be the non-abstracted direct path for cases where you do want to roll your own).
评论 #38402236 未加载
sesmover 1 year ago
Summary of the article: use exponential backoff + jitter for retry intervals.<p>What author didn’t mention: sometimes you want to add jitter to delay the first request too, if the request happens immediately after some event from server (like server waking up). If you don’t do this, you may crash the server, and if your exponential backoff counter is not global you can even put server into cyclic restart.
评论 #38394634 未加载
评论 #38396246 未加载
self_awarenessover 1 year ago
Really nice animations, I especially liked the demonstration of the effect that after some servers will &quot;explode&quot;, any server that will be restarted will automatically be DoS&#x27;ed until we&#x27;ll throw a bunch of extra temporary servers into the system. Thanks.
评论 #38392797 未加载
fadhilkurniaover 1 year ago
The animations are so cool!!!<p>In general the phenomena is known as _metastable failure_ that could be triggered when there are more things to do during failure than normal run.<p>With retry, the client do more work within the same amount of time, compared to doing nothing or doing exponential backoff.
joshkaover 1 year ago
For a lot of things, retry once and only once (at the outermost layer to avoid multiplicative amplification) is more correct. At a large enough scale, failing twice is often significantly (like 90%+) correlated with the likelihood of failing a third time regardless of backoff &#x2F; jitter. This means that the second retry only serves to add more load to an already failing service.
评论 #38397907 未加载
评论 #38396925 未加载
christophbergerover 1 year ago
A must-read (or rather: must-see) for anyone who thinks exponential backoff is overrated.
评论 #38393427 未加载
samwhoover 1 year ago
Thanks for sharing!<p>I’m the author of this post, and happy to answer any questions :)
评论 #38393147 未加载
评论 #38436374 未加载
评论 #38393209 未加载
usrbinbashover 1 year ago
This is the client side of things. And I think this is a great resource that everyone who writes clients for anything, should see.<p>But there is an additional piece of info everyone who writes clients needs to see: And that&#x27;s what people like me, who implement backend services, may do if clients ignore such wisdom.<p>Because: I&#x27;m not gonna let bad clients break my service.<p>What that means in practice: Clients are given a choice: They can behave, or they can<p><pre><code> HTTP 429 Too Many Requests</code></pre>
评论 #38393387 未加载
whenlamboover 1 year ago
Remember to limit the exponential backoff interval if you are not limiting the number of retries
cratermoonover 1 year ago
I worked at a company with a self-inflicted wound related to retries.<p>At some point in the distant (internet time) past, a sales engineer, or the equivalent, had written a sample script to demonstrate basic uses of the API. As many of you quickly guessed, customers went on a copy&#x2F;paste rampage and put this sample script into production.<p>The script went into a tight loop on failure, naively using a simple library that did not include any back-off or retry in the request. I&#x27;m not deeply familiar with how the company dealt with this situation. I am aware there was a complex load balancing system across distributed infrastructure, but also, just a lot of horsepower.<p>Lesson for anyone offering an API product: don&#x27;t hand out example code with a self-own, because it will become someone&#x27;s production code.
davidwover 1 year ago
I have been thinking about queueing theory lately. I don&#x27;t have the math abilities to do anything deep with it, but it seems like even basic applications of certain things could prove valuable in real world situations where people are just kind of winging it with resource allocation.
shokerover 1 year ago
If a picture is worth 1,000 words, then what&#x27;s a well made animation worth? These are great intuitive representations of your retry methods. Bravo!
评论 #38406142 未加载
Probiotic6081over 1 year ago
The pale red failed retries should be more kiki-like, the way they are now, their pointedness is hard to see when theyre moving
Probiotic6081over 1 year ago
Exponential backoff doesn&#x27;t apply for successful requests right? The simulation doesn&#x27;T reflect that i think. peace
评论 #38398846 未加载