Background job queues and priorities may be the wrong path

114 点作者 a12b超过 1 年前

19 条评论

The article is a bit unclear because it's lacking the proper vocabulary. Priorities and deadlines (what the article calls "SLOs") are both valid ways to approach scheduling problems with different tradeoffs.The fixed priority systems the article talks about trade off optimal "capacity" utilization for understandable failure dynamics in the overcapacity case. When you're over capacity, the messages that don't go through are the messages with the lowest priority. It's a nice property and simple to implement.What the article proposes is better known as deadline scheduling. That's also fine and widely used, but it has more complicated failure dynamics in the overcapacity case. If your problem domain doesn't have an inherent "priority" linked to the deadlines, that may be acceptable, but in other cases it may not be.Neither is inherently better and there's other approaches with yet different tradeoffs.

评论 #37894557 未加载

评论 #37892644 未加载

评论 #37892451 未加载

评论 #37892537 未加载

andrelaszlo超过 1 年前

Another important aspect that seems to be overlooked in a lot of these discussions (specifically for Rails, but probably relevant in other contexts as well) is that if your jobs are very heterogeneous in terms of their degree of parallelizability then it will be very difficult to do effective provisioning and capacity planning.Let's say you have two types of jobs: one that is highly parallelizable (Eg request to third party API), and one that doesn't parallelize well (video transcoding using 100% of available CPU). Then you'd want a lot of threads assigned to each CPU for the first type, but very few for the second type. If they're on the same queue, you can have a few threads and be blocked by job type one most of the time, or have a lot of threads and get tons of context switching when the second type of job dominates. Either way your throughput will be very bad.My current idea is to partition job types first by their degree of parallelizability (as in Amdahl's law), then by priority if necessary.Has anyone tried this?

评论 #37896292 未加载

jph超过 1 年前

There are better ways to manage and measure queues, IMHO, by using basic queuing theory such as 'ρ' for utilization, 'ε' for error rate, 'Aθ' for activity step time, and Little's Law.My queuing theory notes and notation are here:<a href="https://github.com/joelparkerhenderson/queueing-theory">https://github.com/joelparkerhenderson/queueing-theory</a>

timeagain超过 1 年前

In a similar vein, something about queuing that has annoyed me as a developer for multiple large FANG corporations is poor thinking about queue metrics. The TLDR is that metics provided by the queue itself are rarely helpful for knowing if your service is healthy, and when it is not healthy they are not very useful for determining why.Most queue processing services that I have seen have an alarm on (a) oldest message age, and (b) number of messages in the queue.In every team I joined I have quickly added a custom metric (c) that subtracts the time of successful processing from the time that a message was /initially/ added to the queue. This metric tends to uncover lots of nasty edge cases regarding retries, priority starving, and P99 behavior that are hidden by (a) and (b).Having 100000 messages in the queue is only an issue if they are not being processed at (at least) 100000/s. Having a 6-hour-old message in the queue is concerning, but maybe it is an extreme outlier, so alarming is unnecessary. But you can bet your bottom dollar that if your average processing latency spikes by 10x that you want to know about it.The other thing that is nice about an end to end latency metric is that (a) and (b) both tend to look great all the way up to the point of failure/back pressure and then they blow up excitingly. (c) on the other hand will pick up on things like a slight increase in application latency, allowing you to diagnose beforehand if your previously over-provisioned queue is becoming at-capacity or under-provisioned.

评论 #37891919 未加载

评论 #37891971 未加载

paulsutter超过 1 年前

If you want to write any complicated concurrent code, the simplest and best way is one long polling loop state machine. You might use a thread to call a blocking API, but the majority of the logic should be in a polling loop.I used to love chained callbacks when I was 16, and later I thought threads were the greatest, and I've written a bunch of device drivers that operate at different IPLs.But 20 years ago a cofounder made me realize that a long polling loop is easier and faster to write, and much easier to understand than threads. That insight has made countless projects simpler and easier and I recommend considering it. You may be surprised, as I was.

评论 #37892046 未加载

评论 #37896507 未加载

评论 #37896446 未加载

corytheboyd超过 1 年前

I have yet to try it IRL, but I’ve always wondered if having a single hyper-scaled worker pool against a single queue would be best suited for a typical web app Do Things Async layer. Add a job timeout so that a single rogue process can’t saturate workers… okay and now add a long_jobs queue for the things that need to run longer… and one more for these other things that need to run longer than that… and… shit.

sargun超过 1 年前

Deadline based scheduling actually lets you do super clever things like "time-shifting" -- you have 10 jobs, that each take 10 seconds, and the deadline is 300 seconds out -- you can fit them in between other jobs. In addition, if you know the requirement of the job in terms of computational needs, you can determine really efficient collocation patterns.IMHO, the problem is that it's really hard for people to think in terms of "I want my job that takes 10 CPU seconds to be done in 300 wall clock seconds". In turn, what batch processing frameworks do, is they can estimate these things, and figure out where to place work. You can also do stuff like deny requests if there isn't capacity (because you know all the scheduled work for the next quanta).

paulgb超过 1 年前

If the author is here: your CSS is breaking the code example. If you remove "white-space: wrap" from the rule `code[class="language-"], pre[class="language-"]`, it seems to do the trick (though I'm not sure if it breaks things on other pages).

评论 #37904877 未加载

hamandcheese超过 1 年前

I largely agree, however I do believe that it is necessary to reserve capacity for lower latency jobs if the variance in job durations is large.For example, suppose you have a burst of 1 hour latency jobs, each of which processes in 10 minutes. It will not take many of these to consume all available workers.If that burst is followed by a single high priority, 10s latency job. Whelp, that jobs latency objective will not be met, since the soonest that a worker will free up to take this work is 10 minutes.So I think the ideal worker pool design does include some amount of reserved capacity for low-latency work.A general purpose workers can of course grab low latency work if it's idle! But the reverse is not true - an idle low-latency worker should not be picking up any long-running job.

Groxx超过 1 年前

># If workers are quiet, job1 will be run first in 10 minutes># If workers are busy, job2 will be run first in 11 minutes># If workers are too busy, both jobs will exceed their max latency.So... priorities for tasks in a background queue.I agree explicit latency tolerance is often a great way to do this - it lets you know what you can relax and reschedule, and if it's far enough in the future you can predict load / scale preemptively. Plus it works without having to deal with defining who gets what priority integer (or when to subdivide floats). But it degrades to the same behavior as priorities.

a12b超过 1 年前

Author here.Thanks to all commenters for sharing their experiences and constructive opinions. It shows that this post is incomplete and far from being perfect. So, I just wrote a post-scriptum to improve it a bit for future readers.<a href="https://alexis.bernard.io/blog/2023-10-15-background-job-queues-and-priorities-may-be-the-wrong-path.html#post-scriptum" rel="nofollow noreferrer">https://alexis.bernard.io/blog/2023-10-15-background-job-que...</a>

adrr超过 1 年前

But what if I run background jobs to protect resources that can't easily scale wide like DB writes or calls to SAAS services that are API throttled.

评论 #37895906 未加载

ralferoo超过 1 年前

Just quickly skimmed this, but it seems the conclusion is wrong:A job needs two attributes to define when it should be started: run_at and max_latency. That means the job worker only needs to order them by run_at + max_latency, and takes the first. It seems both flexible and simple.Just considering two jobs (run_at=10,max_latency=15), (run_at=11,max_latency=13), it's clear that following that approach, the first task would be unnecessarily blocked by the second, or you'd run jobs earlier than run_at specified.

评论 #37895434 未加载

评论 #37892689 未加载

markhahn超过 1 年前

Well duh: you use prioritised queues precisely when there is a capacity limit. In lots of cases, the facility is specifically mandated to achieve as close to full capacity as possible.Which is not to agree with the claim that latency queues and priorities can't achieve latency goals. Your hard requirements establish a minimum viable capacity, and you fill in the bubbles with softer work. Priorities let you distinguish between hard and soft, and to offer fairness among soft.

dudeinjapan超过 1 年前

What is missing from this picture is idleness. For example, suppose I have a SLO 10 sec job A and SLO 5 min job B. If I only get a few Bs sporadically, I may want to define queue X=A only, and queue Y=A,B to use the idle compute to process more As. In the wild, this is a delicate balancing act.

评论 #37896281 未加载

itake超过 1 年前

This does not work if your upstream server can only handle X concurrency per second (think ML GPU) and you need to timeout the job before processing it.

jounker超过 1 年前

Article telling you to avoid queues and priorities advises you to implement a priority queue.

ndriscoll超过 1 年前

It's useful to have queues based on job type because it allows you to stream messages onto the queue as they come in, and then batch pull the work off for processing (many processes are more efficient--possibly vastly so--when run as a batch).

hbrundage超过 1 年前

Its interesting to see how the Rails world still thinks in terms of the number of processes listening to a queue, instead of thinking in the cloud-native, elastic, serverless terms.There's always an autoscaling delay, but Rails itself (and the community) don't seem to fit into the serverless paradigm well such that these questions around how to design your queues come up.I think a lot of Lambda developers or Cloud Run developers would instead say "well my max instances is set to 500, I am pretty sure I'm going to break something else before I hit that", you know? Especially when using the cloud's nice integrations between their queues and their event-driven serverless products its super easy to get exactly as much compute as you need to keep your latency really low.

评论 #37893222 未加载