TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

What happens to latency if service time is cut in half (2022)

130 pointsby luu11 months ago

9 comments

vitus11 months ago
&gt; a utilization of 0.90<p>&gt; So we went from a latency of 1 second to 0.091 seconds which is an 11 times improvement.<p>There&#x27;s your problem -- you should never allow unbounded queue growth at high utilization. Going from 80% to 90% utilization doubles your average wait times. We could similarly make this number arbitrarily large by pushing that utilization closer to 1, e.g. &quot;We halved service time at 99.99% utilization and saw a 10000x improvement&quot;. But that&#x27;s not interesting, as your users will complain that your service is unusable under heavy load far before you get to that point.<p>The typical &quot;fix&quot; is to add load shedding (e.g. based on queue length) combined with some intelligent backoff logic at the client (to reduce congestion), and call it a day. This has its own downsides, e.g. increased latency for everyone in cases of overload. Or, if your configured queue length is too large, you get bufferbloat.<p>(I have seen an argument for using LIFO instead of FIFO, which achieves much more predictable median performance at the expense of causing unbounded badness in the worst case. For this, your client needs to set deadlines, which it should probably be doing anyways.)
评论 #40719166 未加载
kqr11 months ago
Another of those somewhat counter-intuitive results is the answer to the question &quot;how much do we need to scale up to avoid response time regression when the request rate is doubled?&quot;<p>It is very easy to blurt out &quot;well obviously we need twice the processing power!&quot; but if we scale to twice the processing power, then start accepting twice the request rate – we will actually be serving each request in half the time we originally did.<p>To many people that sounds weird; it sounds like we got something for nothing. If I invite twice as many people to a party and buy twice as many cookies, it&#x27;s not like each guest will get twice as many cookies – that just leaves the originally planned number of cookies for each guest.<p>But for response time it comes back to the first equation in TFA:<p><pre><code> T = 1&#x2F;μ · 1&#x2F;(1 - ρ) </code></pre> Doubling both arrival rate and maximum service rate leaves ρ – and the second factor with it – unchanged, but still halves the 1&#x2F;μ factor, resulting in half the response time.<p>The appropriate amount to scale by is the k that solves the equation we get when we set the old response time T at request rate λ equal to the one at request rate 2λ and kμ processing power. This is something like<p><pre><code> T = 1&#x2F;μ · 1&#x2F;(1 - λ&#x2F;μ) = 1&#x2F;kμ · 1&#x2F;(1 - 2λ&#x2F;kμ) </code></pre> but rearranges to the much simpler<p><pre><code> k = ρ + 1 </code></pre> which a colleague of mine told me interprets intuitively as &quot;the processing power that needs to be added is exactly that which will handle an additional unit of the current utilisation on top of the current utilisation, i.e. twice as much.&quot;<p>This is mostly good news for people doing capacity planning in advance of events etc. If you run your systems at reasonable utilisation levels normally, you don&#x27;t actually need that much additional capacity to handle peak loads.
评论 #40717110 未加载
评论 #40716860 未加载
评论 #40717333 未加载
评论 #40716157 未加载
评论 #40719853 未加载
评论 #40716287 未加载
fwlr11 months ago
<p><pre><code> In mathematical queueing theory, Little&#x27;s law (also result, theorem, lemma, or formula[1][2]) is a theorem by John Little which states that the long-term average number L of customers in a stationary system is equal to the long-term average effective arrival rate λ multiplied by the average time W that a customer spends in the system. Expressed algebraically the law is L=λW The relationship is not influenced by the arrival process distribution, the service distribution, the service order, or practically anything else. In most queuing systems, service time is the bottleneck that creates the queue. </code></pre> <a href="https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;Little%27s_law" rel="nofollow">https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;Little%27s_law</a><p>An extremely useful law to remember. You’d be surprised how much bullshit it can detect!
评论 #40716969 未加载
drewhk11 months ago
I am a bit worried of the overuse of Little&#x27;s formula, especially in these catchy educational presentations. In reality queue sizes will dictate your consumer observable latency profile, which is in turn is dictated by the actual distribution of the service time - it is not a constant.<p>If you think about it, if you have an ideal system that serves users like a clockwork, every X ms with no jitter, while your arrival is also completely regular, every Y ms (Y &lt; X), then basically a queue length of 1 is sufficient. In reality, just like we all observe in real-life queues, service time is far from constant, and outliers result in queue buildup. This is why often cutting the tail of service-time latency results in better overall latency than simply reducing average service-time.<p>Little&#x27;s formula of course holds also in the above scenario, but it handles long-time averages and does not give you any indication what extreme behavior is lurking under the mask of these averages.
评论 #40718113 未加载
jph11 months ago
Queueing theory refresher for software developers:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;joelparkerhenderson&#x2F;queueing-theory">https:&#x2F;&#x2F;github.com&#x2F;joelparkerhenderson&#x2F;queueing-theory</a><p>This can help put the article in context. The author writes &quot;wanted to freshen up my queuing theory a bit&quot;.
davherrmann11 months ago
The math seems to be off in a few places:<p>1. When solving for the arrival rate λ, it should be λ=ρ&#x2F;S instead of λ=S&#x2F;ρ.<p>2. Calculating the arrival rate λ with the correct formula results in 9 requests&#x2F;second instead of 10 request&#x2F;second.<p>3. Calculating the new utilization ρ should be 0.05x9 instead of 0.05x10.<p>The result of the calculated utilization ρ is correct (0.45), so is the calculated new latency.<p>edit: new lines
评论 #40716050 未加载
nailer11 months ago
What is service time?<p>Edit: service time is processing time. 6:35 in <a href="https:&#x2F;&#x2F;youtu.be&#x2F;03GsLxVdVzU" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;03GsLxVdVzU</a>
评论 #40715347 未加载
alhaad11 months ago
Why is arrival rate a function of service time?
评论 #40715120 未加载
评论 #40715111 未加载
YetAnotherNick11 months ago
&gt; a utilization of 0.90<p>Industry standard utilization is 50-60%, not 90%. If it is 90% utilization, you are destined for failure as 10% extra requests even for few seconds would kill your service. And it should be 60% even if service time increases or decreases, so decreasing the service time by half should ideally halve the resources.