I like the author's solution (always use bounded queues) because it usually forces you to confront back-pressure up front. It doesn't matter how big your queue is, your system must be designed to work at peak throughput essentially without a queue, and thus your system must handle the possibility that an event fails to be processed and must be retried/dropped. Queues <i>only</i> serve to mitigate bursts.<p>It's annoying but also understandable how often people just dump stuff into an unbounded queue and punt on making sure things work until the system is falling down. Often queues are more of a tool developers and operators can use to navigate unknown performance characteristics and scale later than they are a requirement for the actual system itself.
I'm no expert, but I don't think this is true. Couldn't tasks arrive into the queue at the same rate that they are processed, resulting in a fixed queue size at 100% utilization?<p>Put another way, in the second "No good" diagram showing one task being worked on with none in the queue, another task could arrive before the current one is finished.<p>I suspect the counterargument might be that the task arrival rate is not usually that predictable, but even so, I expect that the amount of variance predicts how true this theory is in practice.
Here's another article about the same issue I think <a href="https://ferd.ca/queues-don-t-fix-overload.html" rel="nofollow">https://ferd.ca/queues-don-t-fix-overload.html</a> .<p>Solution: "Step 1. Identify the bottleneck. Step 2: ask the bottleneck for permission to pile more data in"
A lot of frameworks have queues bounded only by the available memory. As the article nicely demonstrates it implies that CPU must be idle at least 20% of time for that to work to have reasonable bounds on latency. With various GUI event queues that is OK, but it puzzled me why, for example, Erlang opted to have unbounded message queues. Did not Erlang designers care about latency?
> but once you understand it, you’ll have deeper insight into the behavior not just of CPUs and database thread pools, but also grocery store checkout lines, ticket queues, highways – really just a mind-blowing collection of systems.<p>...and karaoke singer rotations. In 6 years of frequently singing karaoke at bars, I've never known a host to turn away new singers, unless it's almost time to end the show. So you could say the queue is unbounded, with predictable results for the few singers who show up early and then get frustrated with the ever-increasing time until their next turn as more singers arrive. I don't know what the best solution is.
If some logic re-orders a job-related queue in order to group similar things, enhancing caches' hit-ratio, then a quite large queue can be quite effective (especially if there is no priority nor barrier, however in such a case the high throughput cost is a high maximal latency).
Queuing theory in more detail: <a href="https://github.com/joelparkerhenderson/queueing-theory" rel="nofollow">https://github.com/joelparkerhenderson/queueing-theory</a><p>The article by Dan is referenced there too.
> If the processor is always busy, that means there’s never even an instant when a newly arriving task can be assigned immediately to the processor. That means that, whenever a task finishes processing, there must always be a task waiting in the queue; otherwise the processor would have an idle moment. And by induction, we can show that a system that’s forever at 100% utilization will exceed any finite queue size you care to pick:<p>This argument is incorrect because it proves too much. If tasks arrive predictably, say at the rate of one per minute, and the processor completes tasks at the same rate, you can have maximum throughput and a finite queue -- indeed, the queue never has to exceed two waiting tasks. The unbounded queue growth problem only arises when tasks arrive unpredictably. Since this argument proves the same conclusion without regard to predictability, it's incorrect.<p>Here's how I think it actually works: if tasks randomly come in and out at the same rate, the size of the queue over time can be modeled as an unbiased random walk with a floor at zero. Elementary techniques prove that such a random walk will hit zero in finite time, and will therefore hit zero infinitely many times. These correspond to the idle times of the processor. The only way to avoid hitting zero over and over is if tasks come in faster than they come out, which leads to tasks piling up over time.
Interesting article, what bothers me a bit is<p>> I won’t go into the M/M/1 part, as it doesn’t really end up affecting the Important Thing I’m telling you about. What does matter, however, is that ‘∞’.<p>I'm really no expert but if I understand it correctly the `M/M` part defines the distributions of the arrival and service processes, so this definitely is important as others have already mentioned in the comments. E.g. a D/D/1 queue where D stands for deterministic arrival shouldn't suffer from this problem.<p>This doesn't change the interesting fact the article presents but throwing in technical terms without proper explanation is imo bad style. Either don't mention it (would be better in this case) or explain it and why it is relevant.<p>This is also a common "mistake" unexperienced people make when writing papers. They mention a fact that is somehow related but is completely irrelevant to the topic and the rest of the paper. I don't want to assume anything but to me this often smells like showing-off.
This is unworkable if you are actively scaling your system. Am i supposed to calculate ideal queue size with each scale out of my data platform?<p>Instead, the right way to think about limiting queue size is load shedding when you feel back pressure.<p>Here’s an example at Cronitor: if our main sqs ingestion queue backs up, our pipeline will automatically move from streaming to micro-batching, drastically reducing the number of messages on the queue at the expense of slightly increased latency per message. At the same time, a less critical piece of our infrastructure pauses itself until the queue is healthy again, shedding one of the largest sources of db load and giving that capacity to ingestion.<p>To me the goal is to feel and respond to back pressure before blowing up and rejecting messages.
And what if items are processed at exactly the same rate they are added? Something is missing from this thesis. There must be an additional assumption at play beyond what is stated in the article.
Perfect time to ask, I've been toying with using the netflix concurrency limits library for a while as opposed to the manual tuning of threadpools, queue depths, etc to achieve good utilization at certain latency. Curious if others have experience with it, and their thoughts:
<a href="https://github.com/Netflix/concurrency-limits" rel="nofollow">https://github.com/Netflix/concurrency-limits</a><p>FWIW envoy also has an adaptive concurrency experimental plugin that seems similar that I'd also love to hear about any real world experience with:
<a href="https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/adaptive_concurrency_filter" rel="nofollow">https://www.envoyproxy.io/docs/envoy/latest/configuration/ht...</a>
What the naysayers are missing is the definition of load; read it again: <i>"[C]apacity is the number of tasks per unit time that can be processed."</i><p>But tasks are variable. One task requires 5 minutes of processing, another one of 35. They arrive randomly. This is also explicitly given, in a caption under one of the diagrams. <i>"A queueing system with a single queue and a single processor. Tasks (yellow circles) arrive at random intervals and take different amounts of time to process."</i><p>People calling the article wrong may be thinking of capacity as "unit time amount of work"; like when the processor is at full capacity, it's doing one second's worth of work every second.<p>If we define capacity this way, the problem goes away: the processor just becomes a leaky bucket. So that is to say, if we know exactly how long each task will take, then we can measure the queue size in terms of total number of seconds of work in the queue. And so then, as long as no more than one second's worth of work is being added to the queue per second, it will not grow without bound, just like a leaky bucket that is not being refilled faster than its leak rate.<p>When capacity is given as a maximum <i>number</i> of tasks per second, there has to be some underlying justification for that, like there is some fixed part to servicing a job such as set up time and clean up time that doesn't go away even if the job takes next to zero seconds, such that jobs effectively have a built-in minimum duration. If it takes one second to set up a job, and one second to clean up, then the maximum capacity is half a job per second: 1800 jobs per hour and so on. Of course the queue starts to backlog when you approach capacity, because the jobs also require nonzero real work in relation to the administrative time.<p>If jobs have no minimum fixed cost attached, then the maximum job rate is unbounded: the shorter the jobs being queued, the more of them can be done per unit time: one million one-microsecond jobs can be done in a second, or a billion one-nanosecond jobs, and so on.
If you are interested in modelling microservices, service busses, and pubsub topologies, you can do some rough models of queues in python with this: queueing-tool.readthedocs.io/
What are the underlying assumptions we can break here?<p>For example, what if tasks were not monolithic? As the queue size grows, increase the caching timeout and don't hit that DB, or decrease timeout, or something like that.<p>Or, what if the input task variance was bounded and we just initialized the queue with 10 tasks? This way, the addition of tasks would never dip below 1 and would never exceed 20 (for example).
For an alternative take on this, read up on the LMAX Disruptor. Of particular note is section 2.5 "The problem of queues."<p><a href="https://lmax-exchange.github.io/disruptor/disruptor.html" rel="nofollow">https://lmax-exchange.github.io/disruptor/disruptor.html</a><p>It has a completely opposite pattern: the more you load it the faster it goes.
This is a good article.<p>Ted Dzuba Monitoring theory covers unbounded queues pretty well: <a href="http://widgetsandshit.com/teddziuba/2011/03/monitoring-theory.html" rel="nofollow">http://widgetsandshit.com/teddziuba/2011/03/monitoring-theor...</a>
Adding just a little capacity makes need for queue capacity collapse.<p>But if rates are not well specified, what "a little" means is also not, and rejecting additions to the queue is the only answer.
This explains why every JIRA backlog I've seen always grows until it becomes unwieldy.<p>I wonder if anyone here has experience with a fixed-size backlog for work tickets.
The only thing to understand - and accept - is that you will always pick the wrong queue. So will everyone else. Always.<p>.<p>I appreciate that this truism has nothing to do with the article.
I just finished implementing a simple backpressure algorithm. Once the queue is more than a certain percentage full, it rejects new items. These rejections get turned into HTTP 503s.
Messaging middleware such as queues is mostly redundant. Simplistically -<p>Client --<p>If you cache a new submission (message) locally before submitting, you just keep re-submitting until the server returns an ACK for that submission; and your local cache is empty.<p>Scale clients out as needed.<p>Server --<p>If a message is received in good order, return an ACK.<p>If a message is a duplicate, discard the message and return an ACK.<p>If a message can only be received once, discard if it already exists on the server, and return an ACK.<p>If a message is invalid, return a FAIL.<p>Scale hardware out or up, as needed (ref. "capacity, item 2 in the linked blog post above). Scale queues out by adding service endpoints on the server. Async/await makes the client experience painless. You save your employer $$ because no additional procurement contracts, no licensing fees, no additional server-side infrastructure to run queues, and no consultancy fees to set up and operate Rabbit MQ/Tibco/IBM MQ/Amazon SQS/Azure Queue Storage or whatever other queueing solution the org uses.<p>Message passing includes concepts auch as durability, security policies, message filtering,delivery policies, routing policies, batching, and so on. The above can support all of that and, if your scenario calls for it, none of it.<p>The financial argument reduces to dev time vs. procurement, deployment and operational costs of whatever 3rd party product is used, as well as integration, of course.<p>* I note and welcome the downvotes. However I'd love it more if you'd present a coherent counter argument with your vote.