TechEcho

8 comments

colonelxcabout 7 years ago

"The percentile values we observed had quite unusual distribution" It is pretty normal to have a long tail in the last few percentile/tenths. Fortunately, most latency monitoring tools do account for this now. Another gotcha is that 99.9 isn't the end of your tail either. Sometimes looking at the "100th percentile" request isn't useful/such an outlier, but you should know it exists.(regarding "Speculative task") Also called "Hedged Request" here in an article called "The Tail at Scale"[1][1] <a href="http://www-inst.eecs.berkeley.edu/~cs252/sp17/papers/TheTailAtScale.pdf" rel="nofollow">http://www-inst.eecs.berkeley.edu/~cs252/sp17/papers/TheTail...</a>

评论 #16834116 未加载

spullaraabout 7 years ago

I have found in many distributed systems the latency of top level requests follows a log-normal distribution, which sounds like what you are describing. But tuning when you launch backup requests you can trade off overhead for reduced tail latencies. After experimenting with it at Twitter I then found a Jeff Dean paper from Google describing it as well. Haven't been able to find that paper again though. Here is his presentation where he discusses backup requests:<a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44875.pdf" rel="nofollow">https://static.googleusercontent.com/media/research.google.c...</a>

评论 #16832220 未加载

KirinDaveabout 7 years ago

I've seen this approach appear without a name very often in highly concurrent Haskell, Erlang and Elixir code. Often a "safe" race combinator appears in good libraries that handles graceful termination. So much so that you can often find local libraries that look like this Haskell code:<pre><code> speculatively :: Int -> IO a -> IO (Either a a) speculatively ms action = race action (threadDelay ms >> action)</code></pre>

philsnowabout 7 years ago

If your backend is Redis, how is starting more speculative hits to Redis going to help, since it's single threaded?I mean if this is indeed helping it seems that it's not Redis that's the long pole, but maybe the request routing mesh or something.

评论 #16835293 未加载

mahoabout 7 years ago

But wouldn't this solution mean that once there is a small glitch in the system, say due to higher load than normal, there are 2x the requests and the system goes down completely?I don't mean to be negative of the solution, I am just curious.

评论 #16833492 未加载

评论 #16834061 未加载

评论 #16833593 未加载

评论 #16833474 未加载

tshanmuabout 7 years ago

isn't it just masking the root cause of whatever thats causing the delay in the first place?

评论 #16832337 未加载

karmakazeabout 7 years ago

A week ago I might have thought this would be good for our microservices, as they were already using short timeouts with a retry but abandoning the original request. This is better. However it has limitations and isn't the solution. It suffers from request amplification if the called service also depends on other services and applies the same retry policy. It it still request latency bound. The main takeaway I've learned for bounding microservices is that the inflow of dependent data should be asynchronous and request latency dependent only on the service's data.

pspeter3about 7 years ago

This seems like a useful practice. Do you log how often this happens and which task is the winner? It seems like without that, you may not notice actual issues.

评论 #16832322 未加载

8 comments

colonelxcabout 7 years ago

评论 #16834116 未加载

spullaraabout 7 years ago

评论 #16832220 未加载

KirinDaveabout 7 years ago

philsnowabout 7 years ago

评论 #16835293 未加载

mahoabout 7 years ago

评论 #16833492 未加载

评论 #16834061 未加载

评论 #16833593 未加载

评论 #16833474 未加载

tshanmuabout 7 years ago

isn't it just masking the root cause of whatever thats causing the delay in the first place?

评论 #16832337 未加载

karmakazeabout 7 years ago

pspeter3about 7 years ago

This seems like a useful practice. Do you log how often this happens and which task is the winner? It seems like without that, you may not notice actual issues.

评论 #16832322 未加载

Defeat your 99th percentile with speculative task

8 comments

Defeat your 99th percentile with speculative task

8 comments