TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Defeat your 99th percentile with speculative task

94 pointsby jrpelkonenabout 7 years ago

8 comments

colonelxcabout 7 years ago
&quot;The percentile values we observed had quite unusual distribution&quot; It is pretty normal to have a long tail in the last few percentile&#x2F;tenths. Fortunately, most latency monitoring tools do account for this now. Another gotcha is that 99.9 isn&#x27;t the end of your tail either. Sometimes looking at the &quot;100th percentile&quot; request isn&#x27;t useful&#x2F;such an outlier, but you should know it exists.<p>(regarding &quot;Speculative task&quot;) Also called &quot;Hedged Request&quot; here in an article called &quot;The Tail at Scale&quot;[1]<p>[1] <a href="http:&#x2F;&#x2F;www-inst.eecs.berkeley.edu&#x2F;~cs252&#x2F;sp17&#x2F;papers&#x2F;TheTailAtScale.pdf" rel="nofollow">http:&#x2F;&#x2F;www-inst.eecs.berkeley.edu&#x2F;~cs252&#x2F;sp17&#x2F;papers&#x2F;TheTail...</a>
评论 #16834116 未加载
spullaraabout 7 years ago
I have found in many distributed systems the latency of top level requests follows a log-normal distribution, which sounds like what you are describing. But tuning when you launch backup requests you can trade off overhead for reduced tail latencies. After experimenting with it at Twitter I then found a Jeff Dean paper from Google describing it as well. Haven&#x27;t been able to find that paper again though. Here is his presentation where he discusses backup requests:<p><a href="https:&#x2F;&#x2F;static.googleusercontent.com&#x2F;media&#x2F;research.google.com&#x2F;en&#x2F;&#x2F;pubs&#x2F;archive&#x2F;44875.pdf" rel="nofollow">https:&#x2F;&#x2F;static.googleusercontent.com&#x2F;media&#x2F;research.google.c...</a>
评论 #16832220 未加载
KirinDaveabout 7 years ago
I&#x27;ve seen this approach appear without a name very often in highly concurrent Haskell, Erlang and Elixir code. Often a &quot;safe&quot; race combinator appears in good libraries that handles graceful termination. So much so that you can often find local libraries that look like this Haskell code:<p><pre><code> speculatively :: Int -&gt; IO a -&gt; IO (Either a a) speculatively ms action = race action (threadDelay ms &gt;&gt; action)</code></pre>
philsnowabout 7 years ago
If your backend is Redis, how is starting more speculative hits to Redis going to help, since it&#x27;s single threaded?<p>I mean if this is indeed helping it seems that it&#x27;s not Redis that&#x27;s the long pole, but maybe the request routing mesh or something.
评论 #16835293 未加载
mahoabout 7 years ago
But wouldn&#x27;t this solution mean that once there is a small glitch in the system, say due to higher load than normal, there are 2x the requests and the system goes down completely?<p>I don&#x27;t mean to be negative of the solution, I am just curious.
评论 #16833492 未加载
评论 #16834061 未加载
评论 #16833593 未加载
评论 #16833474 未加载
tshanmuabout 7 years ago
isn&#x27;t it just masking the root cause of whatever thats causing the delay in the first place?
评论 #16832337 未加载
karmakazeabout 7 years ago
A week ago I might have thought this would be good for our microservices, as they were already using short timeouts with a retry but abandoning the original request. This is better. However it has limitations and isn&#x27;t the solution. It suffers from request amplification if the called service also depends on other services and applies the same retry policy. It it still request latency bound. The main takeaway I&#x27;ve learned for bounding microservices is that the inflow of dependent data should be asynchronous and request latency dependent only on the service&#x27;s data.
pspeter3about 7 years ago
This seems like a useful practice. Do you log how often this happens and which task is the winner? It seems like without that, you may not notice actual issues.
评论 #16832322 未加载