SQL Maxis: Why We Ditched RabbitMQ and Replaced It with a Postgres Queue

628 pointsby ctc24about 2 years ago

68 comments

mark242about 2 years ago

In summary -- their RabbitMQ consumer library and config is broken in that their consumers are fetching additional messages when they shouldn't. I've never seen this in years of dealing with RabbitMQ. This caused a cascading failure in that consumers were unable to grab messages, rightfully, when only one of the messages was manually ack'ed. Fixing this one fetch issue with their consumer would have fixed the entire problem. Switching to pg probably caused them to rewrite their message fetching code, which probably fixed the underlying issue.It ultimately doesn't matter because of the low volume they're dealing with, but gang, "just slap a queue on it" gets you the same results as "just slap a cache on it" if you don't understand the tool you're working with. If they knew that some jobs would take hours and some jobs would take seconds, why would you not immediately spin up four queues. Two for the short jobs (one acting as a DLQ), and two for the long jobs (again, one acting as a DLQ). Your DLQ queues have a low TTL, and on expiration those messages get placed back onto the tail of the original queues. Any failure by your consumer, and that message gets dropped onto the DLQ and your overall throughput is determined by the number * velocity of your consumers, and not on your queue architecture.This pg queue will last a very long time for them. Great! They're willing to give up the easy fanout architecture for simplicity, which again at their volume, sure, that's a valid trade. At higher volumes, they should go back to the drawing board.

评论 #35534881 未加载

评论 #35535313 未加载

评论 #35529389 未加载

评论 #35529583 未加载

评论 #35528446 未加载

评论 #35529093 未加载

评论 #35532184 未加载

评论 #35535523 未加载

评论 #35543563 未加载

评论 #35536021 未加载

评论 #35529118 未加载

评论 #35534251 未加载

评论 #35535361 未加载

评论 #35530350 未加载

评论 #35551141 未加载

nemothekidabout 2 years ago

>To make all of this run smoothly, we enqueue and dequeue thousands of jobs every day.If you your needs aren't that expensive, and you don't anticipate growing a ton, then it's probably a smart technical decision to minimize your operational stack. Assuming 10k/jobs a day, thats roughly 7 jobs per minute. Even the most unoptimized database should be able to handle this.

评论 #35529221 未加载

评论 #35528128 未加载

评论 #35528029 未加载

评论 #35529564 未加载

评论 #35528282 未加载

评论 #35529370 未加载

评论 #35533825 未加载

评论 #35528131 未加载

simonwabout 2 years ago

The best thing about using PostgreSQL for a queue is that you can benefit from transactions: only queue a job if the related data is 100% guaranteed to have been written to the database, in such a way that it's not possible for the queue entry not to be written.Brandur wrote a great piece about a related pattern here: <a href="https://brandur.org/job-drain" rel="nofollow">https://brandur.org/job-drain</a>He recommends using a transactional "staging" queue in your database which is then written out to your actual queue by a separate process.

评论 #35528026 未加载

评论 #35528275 未加载

评论 #35530573 未加载

评论 #35530587 未加载

评论 #35531918 未加载

评论 #35537166 未加载

sa46about 2 years ago

Here are a couple of tips if you want to use postgres queues:- You probably want FOR NO KEY UPDATE instead of FOR UPDATE so you don't block inserts into tables that have a foreign key relationship with the job table. [1]- If you need to process messages in order, you don't want SKIP LOCKED. Also, make sure you have an ORDER BY clause.My main use-case for queues is syncing resources in our database to QuickBooks. The overall structure looks like:<pre><code> BEGIN; -- start a transaction SELECT job.job_id, rm.data FROM qbo.transmit_job job JOIN resource_mutation rm USING (tenant_id, resource_mutation_id) WHERE job.state = 'pending' ORDER BY job.create_time LIMIT 1 FOR NO KEY UPDATE OF job NOWAIT; -- External API call to QuickBooks. -- If successsful: UPDATE qbo.transmit_job SET state = 'transmitted' WHERE job_id = $1; COMMIT; </code></pre> This code will serialize access to the transmit_job table. A more clever approach would be to serialize access by tenant_id. I haven't figured out how to do that yet (probably lock on a tenant ID first, then lock on the job ID).Somewhat annoyingly, Postgres will log an error if another worker holds the row lock (since we're not using SKIP LOCKED). It won't block because of NOWAIT.CrunchyData also has a good overview of Postgres queues: [2][1]: <a href="https://www.migops.com/blog/2021/10/05/select-for-update-and-its-behavior-with-foreign-keys-in-postgresql/" rel="nofollow">https://www.migops.com/blog/2021/10/05/select-for-update-and...</a>[2]: <a href="https://blog.crunchydata.com/blog/message-queuing-using-native-postgresql" rel="nofollow">https://blog.crunchydata.com/blog/message-queuing-using-nati...</a>

评论 #35535363 未加载

eckesicleabout 2 years ago

Postgres is probably the best solution for every type of data store for 95-99% of projects. The operational complexity of maintaining other attached resources far exceed the benefit they realise over just using Postgres.You don’t need a queue, a database, a blob store, and a cache. You just need Postgres for all of these use cases. Once your project scales past what Postgres can handle along one of these dimensions, replace it (but most of the time this will never happen)It also does wonders for your uptime and SLO.

评论 #35528935 未加载

评论 #35530961 未加载

评论 #35531731 未加载

评论 #35532870 未加载

评论 #35534396 未加载

code-eabout 2 years ago

As the maintainer of a rabbitmq client library (not the golang one mentioned in the article) the bit about dealing with reconnections really range true. Something about the AMQP protocol seems to make library authors just... avoid dealing with it, forcing the work onto users, or wrapper libraries. It's a real frustration across languages, golang, python, JS, etc. Retry/reconnect is built in to HTTP libraries, and database drivers. Why don't more authors consider this a core component of a RabbitMQ client?

评论 #35532979 未加载

评论 #35529631 未加载

评论 #35531771 未加载

评论 #35552033 未加载

评论 #35529897 未加载

borplkabout 2 years ago

In many scenarios a DB/SQL-backed queue is far superior to the fancy queue solutions such as RabbitMQ because it gives you instantaneous granular control over 'your queue' (since it is the result set of your query to reserve the next job).Historically people like to point out the common locking issues etc... with SQL but in modern datbases you have a good number of tools to deal with that ("select for update nowait").If you think about it a queue is just a performance optimisation (it helps you get the 'next' item in a cheap way, that's it).So you can get away with "just a db" for a long time and just query the DB to get the next job (with some 'reservations' to avoid duplicate processing).At some point you may overload the DB if you have too many workers asking the DB for the next job. At that point you can add a queue to relieve that pressure.This way you can keep a super dynamic process by periodically selecting 'next 50 things to do' and injecting those job IDs in the queue.This gives you the best of both worlds because you can maintain granular control of the process by not having large queues (you drip feed from DB to queue in small batches) and the DB is not overly burdened.

评论 #35528328 未加载

jribabout 2 years ago

> One of our team members has gotten into the habit of pointing out that “you can do this in Postgres” whenever we do some kind of system design or talk about implementing a new feature. So much so that it’s kind of become a meme at the company.love it

评论 #35529972 未加载

autospeaker22about 2 years ago

We do just about everything with one or more Postgres databases. We have workers that query the db for tasks, do the work, and update the db. Portals that are the read-only view of the work being performed, and it's pretty amazing how far we've gotten with just Postgres and no real tuning on our end. There's been a couple scenarios where query time was excessive and we solved by learning a bit more about how Postgres worked and how to redefine our data model. It seems to be the swiss army knife that allows you to excel at most general cases, and if you need to do something very specific, well at that point you probably need a different type of database.

sass_muffinabout 2 years ago

I find it funny how sometimes there are two sides to the same coin, and articles like these rarely talk about engineering tradeoffs. Just one solution good, other solution bad. I think it is a mistake for a technical discussion to not talk in terms of tradeoffs.Obviously it makes sense to not use complex tech when simple tech works, especially at companies with a lower traffic volume. That is just practical engineering.The inverse, however, can also be true. At super high volumes you run into issues really quickly. Just got off a 3 hour site-wide outage due to the database unable to keep up with the unprecedented queue load, and the db system basically ground to a halt. The proposed solution is actually to move off of a dedicated db queue for SQS.This was a system running that has run well for about 10 years. Granted there was an unprecedented queue volume for this system, but sometimes a scaling ceiling is hit, and it is hit faster than you might expect from all these comments saying to always use a db always, even with all the proper indexing and optimizations.

smallerfishabout 2 years ago

We've inadvertently "load tested" our distributed locking / queue impl on postgres in production, and so I know that it can handle hundreds of thousands of "what should I run / try to take lock on task" queries per minute, with a schema designed to avoid bloat/vacuuming, tuned indices, and reasonably beefy hardware.

concerned_about 2 years ago

RabbitMQ may have been overkill for the need, but it's also clear that there was an implementation bug which was missed.Db queues are simple to implement and so given the volume it's one way to approach working around an mq client issue.Personally, and I mean personally, I have found messaging platforms to be full of complexity, fluff, and non-standard "standards", it's just alot of baggage and in the case of messaging alot of bugs.I have seen Kafka deployed and ripped out a year later, and countless bugs in client implementations due to developer misunderstanding, poor documentation, and unnecessary complexity.For this reason, I refer to event driven systems as "expert systems" to be avoided. But in your life "there will be queues"

chimeabout 2 years ago

If you don't want to roll your own, look into <a href="https://github.com/timgit/pg-boss">https://github.com/timgit/pg-boss</a>

andrewstuartabout 2 years ago

I wrote a message queue in Python called StarQueue.It’s meant to be a simpler reimagining of Amazon SQS.It has an HTTP API and behaves mostly like SQS.I wrote it to support Postgres, Microsoft’s SQL server and also MySQL because they all support SKIP LOCKED.At some point I turned it into a hosted service and only maintained the Postgres implementation though the MySQL and SQL server code is still in there.It’s not an active project but the code is at <a href="https://github.com/starqueue/starqueue/">https://github.com/starqueue/starqueue/</a>After that I wanted to write the worlds fastest message queue so I implemented an HTTP message queue in Rust. It maxed out the disk at about 50,000 messages a second I vaguely recall, so I switched to purely memory only and in the biggest EC2 instance I could run it on it did about 7 million messages a second. That was just a crappy prototype so I never released the code.After that I wanted to make the simplest possible message queue so I discovered that Linux atomic moves are the basis of a perfectly acceptable message queue that is simply file system based. I didn’t put it into a message queue, but close enough to be the same I wrote an SMTP buffer called Arnie. It’s only about 100 lines of Python. <a href="https://github.com/bootrino/arniesmtpbufferserver">https://github.com/bootrino/arniesmtpbufferserver</a>

jasonlotitoabout 2 years ago

So, this article contains a serious issue.What is the prefetch value for RabbitMQ mean? > The value defines the max number of unacknowledged deliveries that are permitted on a channel.From the Article: > Turns out each RabbitMQ consumer was prefetching the next message (job) when it picked up the current one.that's a prefetch count of 2.The first message is unacknowledged, and if you have a prefetch count of 1, you'll only get 1 message because you've set the maximum number of unacknowledged messages to 1.So, I'm curious what the actual issue is. I'm sure someone checked things, and I'm sure they saw something, but this isn't right.tl;dr: prefetch count of 1 only gets one message, it doesn't get one message, and then a second.Note: I didn't test this, so there could be some weird issue, or the documentation is wrong, but I've never seen this as an issue in all the years I've used RabbitMQ.

评论 #35527955 未加载

评论 #35528069 未加载

评论 #35527870 未加载

评论 #35528123 未加载

评论 #35530154 未加载

dapearceabout 2 years ago

Love to see it. We (CoreDB) recently released PGMQ, a message queue extension for Postgres: <a href="https://github.com/CoreDB-io/coredb/tree/main/extensions/pgmq">https://github.com/CoreDB-io/coredb/tree/main/extensions/pgm...</a>

评论 #35529011 未加载

CobaltHorizonabout 2 years ago

This is interesting because I’ve seen a queue that was implemented in Postgres that had performance problems before: the job which wrote new work to the queue table would have DB contention with the queue marking the rows as processed. I wonder if they have the same problem but the scale is such that it doesn’t matter or if they’re marking the rows as processed in a way that doesn’t interfere with rows being added.

评论 #35527768 未加载

评论 #35527707 未加载

评论 #35527738 未加载

评论 #35531550 未加载

sevenf0urabout 2 years ago

Sounds like a poorly written AMQP client of which there are many. Either you go bare bones and write wrappers to implement basic functionality or find a fully fleshed out opinionated client. If you can get away with using PostgreSQL go for it.

评论 #35528797 未加载

gorjusborgabout 2 years ago

I'm all for simplifying stacks by removing stuff that isn't needed.I've also used in-database queuing, and it worked well enough for some use cases.However, most importantly: calling yourself a maxi multiple times is cringey and you should stop immediately :)

评论 #35531278 未加载

TexanFellerabout 2 years ago

Using a DB as an event queue opens up many options not easily possible with traditional queues. You can dedupe your events by upserting. You can easily implement dynamic priority adjustment to adjust processing order. Dedupe and priority adjustment feels like an operational superpower.

评论 #35533772 未加载

pnathanabout 2 years ago

I've had a very good experience with pg queuing. I didn't even know `skip locked` was a pg clause. That would have... made the experience even better!I am afraid I've moved to a default three-way architecture:- backend autoscaling stateless server- postgres database for small data- blobstore for large datait's not that other systems are bad. its just that those 3 components get you off the ground flying, and if you're struggling to scale past that, you're already doing enormous volume or have some really interesting data patterns (geospatial or timeseries, perhaps).

评论 #35530242 未加载

bstempiabout 2 years ago

I've done something like this and opted to use advisory locks instead of row locks thinking that I'd increase performance by avoiding an actual lock.I'm curious to hear what the team thinks the pros/cons of a row vs advisory lock are and if there really are any performance implications. I'm also curious what they do with job/task records once they're complete (e.g., do they leave them in that table? Is there some table where they get archived? Do they just get deleted?)

评论 #35528269 未加载

评论 #35528568 未加载

u89012about 2 years ago

Would be nice if a little more detail were added in order to give anyone looking to do the same more heads-up to watch out for potential trouble spots. I take it the workers are polling to fetch the next job which requires a row lock which in turn requires a transaction yeah? How tight is this loop? What's the sleep time per thread/goroutine? At what point does Postgres go, sorry not doing that? Or is there an alternative to polling and if so, what? :)

评论 #35528967 未加载

rorymalcolmabout 2 years ago

Were Prequel using RaabitMQ to stay cloud platform agnostic when spinning up new environments? Always wondered how companies that offer managed services on the customers cloud like this manage infrastructure in this regard. Do you maintain an environment on each cloud platform with a relatively standard configuration, or do you have a central cluster hosted in one cloud provider which the other deployments phone home to?

评论 #35533384 未加载

anecdotal1about 2 years ago

Postgres job queue in Elixir: Oban"a million jobs a minute"<a href="https://getoban.pro/articles/one-million-jobs-a-minute-with-oban" rel="nofollow">https://getoban.pro/articles/one-million-jobs-a-minute-with-...</a>

stereosteveabout 2 years ago

Another good library for this is Graphile Worker.Uses both listen notify and advisory locks so it is using all the right features. And you can enqueue a job from sql and plpgsql triggers. Nice!Worker is in Node js.<a href="https://github.com/graphile/worker">https://github.com/graphile/worker</a>

FooBarWidgetabout 2 years ago

I've also used PostgreSQL as a queue but I worry about operational implications. Ideally you want clients to dequeue an item, but put it back in the queue (rollback transaction) if they crash while prpcessing the item. But processing is a long-running task, which means that you need to keep the database connection open while processing. Which means that your number of database connections must scale along with the number of queue workers. And I've understood that scaling database connections can be problematic.Another problem is that INSERT followed by SELECT FOR UPDATE followed by UPDATE and DELETE results in a lot of garbage pages that need to be vacuumed. And managing vacuuming well is also an annoying issue...

评论 #35528705 未加载

twawaaayabout 2 years ago

As much as I detest MongoDB immaturity in many respects, I found a lot of features that are actually making life easier when you design pretty large scale applications (mine was typically doing 2GB/s of data out of the database, I like to think it is pretty large).One feature I like is change event stream which you can subscribe to. It is pretty fast and reliable and for good reason -- the same mechanism is used to replicate MongoDB nodes.I found you can use it as a handy notification / queueing mechanism (more like Kafka topics than RabbitMQ). I would not recommend it as any kind of interface between components but within an application, for its internal workings, I think it is pretty viable option.

评论 #35529409 未加载

评论 #35528180 未加载

评论 #35532151 未加载

yargabout 2 years ago

> We maintain things like queue ordering by adding an ORDER BY clause in the query that consumers use to read from it (groundbreaking, we know).The dude's being a bit too self-deprecating with that (sarcastic quip).But there's valuable something buried there - if it is possible to efficiently solve a problem without introducing novel mechanisms and concepts, it is highly desirable to do so.Don't reinvent the wheel unless you need to.> You could set the prefetch count to 1, which meant every worker will prefetch at most 1 message. Or you could set it to 0, which meant they will each prefetch an infinite number of messages.O what the actual fuck?I'm really hoping he's right about holding it wrong, because otherwise ???

animexabout 2 years ago

Interestingly, we've always started with an SQL custom queue and thought one day we'll "upgrade to RabbitMQ".

semiquaverabout 2 years ago

When your workload is trivially tiny, most any technology can be made to work.

评论 #35528464 未加载

stolsvikabout 2 years ago

I will argue that what they want is a "work queue", not a message queue.I've written about a somewhat similar problem in context of the Mats3 library I've made: <a href="https://mats3.io/patterns/work-queues/" rel="nofollow">https://mats3.io/patterns/work-queues/</a>And yes, the point is then to use a database to hold the work, dispatching from that. In the Mats3 context I describe, the reason is to pace the dispatching to a queue (!), but for them, it should be to just run the work from that table. Also, the introspection/monitoring argument presented should be relevant for them.That a message queue library fetches more messages than the one it is working on is totally normal: ActiveMQ per default uses a prefetch of 1000 for queues, and Short.MAX_VALUE-1 (!) for topics. The broker will backfill you when you've depleted half of that. This is obviously to gain speed, so that once you've finished with one message, you already have another available, not needing to go back to the broker with both network and processing latencies: <a href="https://activemq.apache.org/what-is-the-prefetch-limit-for" rel="nofollow">https://activemq.apache.org/what-is-the-prefetch-limit-for</a>In summary, I feel that the use case they have, "thousands of jobs per day", which is extremely little for a message queue, where many of these jobs are hours-long, is .. well .. not optimal use case for a MQ. It is the wrong tool for the job, and just adds complexity.

mads_quistabout 2 years ago

While some argue that RabbitMQ was misconfigured, I agree with the argument that reducing tech stack complexity is beneficial if a technology does not offer advantages that cannot be achieved with the basic tech stack.When programming my side hustle I also had the requirement of a SIMPLE queue. I didn't want to introduce AWS SQS or RabbitMQ, so I wrote a few C# classes which where dequeueing from a MongoDB collection. It works pretty well. It basically leverages the atomic MongoDB operation "findAndModify", so you can ensure that dequeueing will find only messages in status "enqueued" and in the same operation sets the status to "processing" so you can ensure that only one reader processes the message. (<a href="https://www.mongodb.com/docs/manual/reference/method/db.collection.findAndModify" rel="nofollow">https://www.mongodb.com/docs/manual/reference/method/db.coll...</a>)I created a small NuGet package, which you can find here: <a href="https://allquiet.app/open-source/mongo-queueing" rel="nofollow">https://allquiet.app/open-source/mongo-queueing</a>.

akamaozuabout 2 years ago

re: Prefetch and Reconnection Issues- clear case of misconfigured instance (prefetch) and a bug somewhere in the stack (reconnection issue)- prefetch behavior sounds like they receive a message, ack it then process it- i wouldn't recommend ack before processing, because you become responsible for tracking and verifying if the worker ran to completion or not.- work then ack is the way. the other way around ignores key job processing benefits like rabbitmq automagically requeueing messages when a worker crashes and failure related logic like deadletter queues.- the trick i've started leaning on with rabbitmq is giving each worker their own instance queue (i call it their mailbox).- when a worker starts a job, it writes the job id, start time and the worker's mailbox to a db. any system can now look up the "running job" in the database, know how long it has been running and can even talk to the worker using its mailbox to inquire if it is running and if that job state in the db is accurate.- happy the writer and team found what works for them. ultimately, what you understand best would serve you better, so they made a good choice to lean on their strengths (postgre).

评论 #35537798 未加载

yawboakyeabout 2 years ago

from what i gather, it looks like they saw the 'q' in rabbitmq and thought 'ha, a queue we could use.' totally ignoring the 'm' part. the blog post doesn't say much but it's obvious they were not sending 'messages'. a message is a specific thing, it contains information that is meaningful to the recipient[0]. or perhaps rabbitmq has allowed itself to be drawn into all sorts of use cases (as a result of competition with kafka)? a message should be very small, immediately ack-ed or rejected (i.e. before any processing begins). that's why rabbitmq assumes it can run entirely in memory, because messages are not expected to stay in queues for long (i.e. during processing by recipients).[0]: <a href="https://en.wikipedia.org/wiki/Information_theory" rel="nofollow">https://en.wikipedia.org/wiki/Information_theory</a>

DevKoalaabout 2 years ago

I’ve been doing this for a long time. Back then I thought I was just being lazy, not wanting to maintain another component for a low volume of events, but over time, I saw the elegance of reducing the number of components in the architecture. Today, a quarter of a billion dollar/yr business runs on top of that queue, which just works.

avinasshabout 2 years ago

> And we guarantee that jobs won’t be picked up by more than one worker through simple read/write row-level locks. The new system is actually kind of absurdly simple when you look at it. And that’s a good thing. It’s also behaved flawlessly so far.Wouldn't this lead to contention issue when a lot of multiple workers are involved?

评论 #35527949 未加载

评论 #35527896 未加载

评论 #35527918 未加载

omneityabout 2 years ago

Postgres is super cool and comes with batteries for almost any situation you can throw at it. Low throughput scenarios are a great match. In high throughput cases, you might find yourself not needing all the extra guarantees that Postgres gives you, and at the same time you might need other capabilities that Postgres was not designed to handle, or at least not without a performance hit.Like everything else in life, it's always a tradeoff. Know your workload, the tradeoffs your tools are making, and make sure to mix and match appropriately.In the case of Prequel, it seems they possibly have a low throughput situation at hands, i.e. in the case of periodic syncs the time spent queuing the instruction <<< the time needed to execute it. Postgres is great in this case.

sontekabout 2 years ago

Another article I saw on HN awhile back where someone did this:<a href="https://webapp.io/blog/postgres-is-the-answer/">https://webapp.io/blog/postgres-is-the-answer/</a>I think its a reasonable option and the webapp.io people scaled it out pretty high. At Zapier we utilize RabbitMQ heavily and I cannot imagine scaling the amount of tasks we handle each day on postgres.This article mostly is lesson learns and misconfigurations of RabbitMQ though. Which is probably a good reason to simplify if you don't need its power. No reason to have to learn how to configure it if postgres is good enough.

crooked-vabout 2 years ago

A lot of companies would be better off if they had just used a single big database instance with some read replicas instead of all the distributed cloud blahblahblah that 99.9% of even tech companies will never need.

endisneighabout 2 years ago

Thousands a day? Really? Even if it were hundreds of thousands a day it would make more sense to use a managed Pub Sub service and save yourself the hassle (assuming modest throughput).

评论 #35528292 未加载

jablabout 2 years ago

Seems like a slamdunk example of choosing boring technology. <a href="https://boringtechnology.club/" rel="nofollow">https://boringtechnology.club/</a>

exabrialabout 2 years ago

We use ActiveMQ (classic) because of the incredible console. Combine that with hawt.io and you get some extra functionality not included in the normal console.I'm always surprised, even with the older version of ActiveMQ, what kind of throughput you can get, on modest hardware. A 1gb kvm with 1 cpu easily pushes 5000 msgs/second across a couple hundred topics and queues. Quite impressive and more than we need for our use case. ActiveMQ Artemis is supposed to scale even farther out.

评论 #35530118 未加载

jwmozabout 2 years ago

For small stuff, rabbitmq and celery are hideously heavy to use. I had issues with celery-for bg tasks that are scheduled you could not execute further async requests in the tasks which was mind blowingly useless. This was years ago.Nowadays for small stuff I just create a simple script that uses asyncio, I can async request stuff and run_forever etc and handle it all as a docker service.

VincentEvansabout 2 years ago

One thing worth pointing out - that the approach described in TFA changes PUSH architecture to PULL.So now you have to deal with deciding how tight your polling loop is, and with reads that are happening regardless of whether you have messages waiting to be processed or not, expending both CPU and requests, which may matter if you are billed accordingly.Not in any way knocking it, just pointing out some trade-offs.

rbutabout 2 years ago

Also had a similar experience using RabbitMQ with Django+Celery. Extremely complicated and workers/queues would just stop for no reason.Moved to Python-RQ [1] + Redis and been rock solid for years now. Redis is also great for locking to ensure only one instance of a job/task can run at a time.[1] <a href="https://python-rq.org/" rel="nofollow">https://python-rq.org/</a>

coding123about 2 years ago

We have a mix of agenda jobs and rabbitmq. I know there are more complex use-cases, like fan out. but in reality the rabbit stack keeps disconnecting silently in the stack we're using (js). Someone has to go in and restart pods (k8s).All the stuff on Agenda works perfectly all the time. (which is basically using mongo's find and update)

kevsimabout 2 years ago

I think Segment did something similar a while back. Instead of using Kafka for the queue of events in coming in, they built a queue on MySQL [0]0: <a href="https://segment.com/blog/introducing-centrifuge/" rel="nofollow">https://segment.com/blog/introducing-centrifuge/</a>

SergeAxabout 2 years ago

> One of our team members has gotten into the habit of pointing out that “you can do this in Postgres”Actually, using Postgres stored procedures they can do anything in Postgres. I am quite sure they can rewrite their entire product using only stored procedures. Doesn't mean they really want to do that, of course.

munichaabout 2 years ago

Take a look at Apache pulsar which is a messaging and streaming platform<a href="https://streamnative.io/blog/comparison-of-messaging-platforms-apache-pulsar-vs-rabbitmq-vs-nats-jetstream" rel="nofollow">https://streamnative.io/blog/comparison-of-messaging-platfor...</a>

tantalorabout 2 years ago

What does "maxi/maxis" mean in this context?Google search for [sql maxis] just returns this article.

habiburabout 2 years ago

Something I didn't get :SQL server won't callback clients [or application-server] informing that new data is available. So how do you poll it? A query in periodic loop? Some other way? How much that scales, or how much load does that create on the server?

评论 #35533503 未加载

SergeAxabout 2 years ago

But... That means that their workers would constantly poll Postgres server instead of push mechanism of AMQT, aren't they?Also, problem described seems like a logical error. Worker shouldn't ack the job before finishing it.

say_it_as_it_isabout 2 years ago

I love postgresql. It's a great database. However, this blog post is by people who are not quite experienced enough with message processing systems to understand that the problem wasn't RabbitMQ but how they used it.

user3939382about 2 years ago

Another middle ground is AWS Batch. If you don’t need like complicated/rules based on the outcome of the run etc it’s simpler, especially if you’re already used to doing ECS tasks.

gamednaabout 2 years ago

Side note, the amount of times that this article redundantly mentioned "Ditched RabbitMQ And Replaced It With A Postgres Queue" made me kinda sick.

0xbadcafebeeabout 2 years ago

It's important not to gloss over what your actual use-case is. Don't just pick some tech because "it seems simpler". Who gives a crap about simplicity if it doesn't meet your needs? List your exact needs and how each solution is going to meet them, and then pick the simplest solution that meets your needs.If you ever get into a case where "we don't think we're using it right", then you didn't understand it when you implemented it. That is a much bigger problem to understand and prevent in the future than the problem of picking a tool.

macspoofingabout 2 years ago

How do you handle stale 'processing' jobs (i.e. jobs that were picked-up by a consumer but never finished - maybe because the consumer died)?

评论 #35532182 未加载

评论 #35529895 未加载

rawoke083600about 2 years ago

By holding on to msg for a long time,and then ack ? You basically trying to 'store state' in your message-pipline ?

phlakatonabout 2 years ago

I don't doubt that a switch from a custom backfill management system written on RabbitMQ could be rewritten in certain cases on Postgres for equivalent if not better results.My question would be why you're in the business of writing a job manager in the first place.

评论 #35533036 未加载

sidcoolabout 2 years ago

What are the workers that the post mentions? Are these cron jobs?

estabout 2 years ago

in my experience, implementing a queue in pg is more troublesome than mysql.In Mysql you can directly callUpdate queue set working_id=xxxx where working_id='' limit 1to atomically pop one task from queue.

dmtroyerabout 2 years ago

> took half a day to implement + testanyone else have trouble completely disregarding the whole of the article when they see things like this?

aphsalinaabout 2 years ago

What is Maxis?

northisupabout 2 years ago

Reticulating Splines?

评论 #35534220 未加载

j3th9nabout 2 years ago

Sooner or later they will have to deal with deadlocks.

tonymetabout 2 years ago

whenever i see RDBMS queues i think : why would you implement a queue or stack in a b-tree ?always go back to fundamentals. the rdbms is giving you replication , queries , locking but at what cost ?

haartsabout 2 years ago

I didn't even know Postgres had a queue last year. I used it just for fun and it is GREAT. People using Kafka are kidding themselves.

68 comments

mark242about 2 years ago

评论 #35534881 未加载

评论 #35535313 未加载

评论 #35529389 未加载

评论 #35529583 未加载

评论 #35528446 未加载

评论 #35529093 未加载

评论 #35532184 未加载

评论 #35535523 未加载

评论 #35543563 未加载

评论 #35536021 未加载

评论 #35529118 未加载

评论 #35534251 未加载

评论 #35535361 未加载

评论 #35530350 未加载

评论 #35551141 未加载

nemothekidabout 2 years ago

评论 #35529221 未加载

评论 #35528128 未加载

评论 #35528029 未加载

评论 #35529564 未加载

评论 #35528282 未加载

评论 #35529370 未加载

评论 #35533825 未加载

评论 #35528131 未加载

simonwabout 2 years ago

评论 #35528026 未加载

评论 #35528275 未加载

评论 #35530573 未加载

评论 #35530587 未加载

评论 #35531918 未加载

评论 #35537166 未加载

sa46about 2 years ago

评论 #35535363 未加载

eckesicleabout 2 years ago

评论 #35528935 未加载

评论 #35530961 未加载

评论 #35531731 未加载

评论 #35532870 未加载

评论 #35534396 未加载

code-eabout 2 years ago

评论 #35532979 未加载

评论 #35529631 未加载

评论 #35531771 未加载

评论 #35552033 未加载

评论 #35529897 未加载

borplkabout 2 years ago

评论 #35528328 未加载

jribabout 2 years ago

评论 #35529972 未加载

autospeaker22about 2 years ago

sass_muffinabout 2 years ago

smallerfishabout 2 years ago

concerned_about 2 years ago

chimeabout 2 years ago

If you don't want to roll your own, look into <a href="https://github.com/timgit/pg-boss">https://github.com/timgit/pg-boss</a>

andrewstuartabout 2 years ago

jasonlotitoabout 2 years ago

评论 #35527955 未加载

评论 #35528069 未加载

评论 #35527870 未加载

评论 #35528123 未加载

评论 #35530154 未加载

dapearceabout 2 years ago

评论 #35529011 未加载

CobaltHorizonabout 2 years ago

评论 #35527768 未加载

评论 #35527707 未加载

评论 #35527738 未加载

评论 #35531550 未加载

sevenf0urabout 2 years ago

评论 #35528797 未加载

gorjusborgabout 2 years ago

评论 #35531278 未加载

TexanFellerabout 2 years ago

评论 #35533772 未加载

pnathanabout 2 years ago

评论 #35530242 未加载

bstempiabout 2 years ago

评论 #35528269 未加载

评论 #35528568 未加载

u89012about 2 years ago

评论 #35528967 未加载

rorymalcolmabout 2 years ago

评论 #35533384 未加载

anecdotal1about 2 years ago

stereosteveabout 2 years ago

FooBarWidgetabout 2 years ago

评论 #35528705 未加载

twawaaayabout 2 years ago

评论 #35529409 未加载

评论 #35528180 未加载

评论 #35532151 未加载

yargabout 2 years ago

animexabout 2 years ago

Interestingly, we've always started with an SQL custom queue and thought one day we'll "upgrade to RabbitMQ".

semiquaverabout 2 years ago

When your workload is trivially tiny, most any technology can be made to work.

评论 #35528464 未加载

stolsvikabout 2 years ago

mads_quistabout 2 years ago

akamaozuabout 2 years ago

评论 #35537798 未加载

yawboakyeabout 2 years ago

DevKoalaabout 2 years ago

avinasshabout 2 years ago

评论 #35527949 未加载

评论 #35527896 未加载

评论 #35527918 未加载

omneityabout 2 years ago

sontekabout 2 years ago

crooked-vabout 2 years ago

endisneighabout 2 years ago

Thousands a day? Really? Even if it were hundreds of thousands a day it would make more sense to use a managed Pub Sub service and save yourself the hassle (assuming modest throughput).

评论 #35528292 未加载

jablabout 2 years ago

Seems like a slamdunk example of choosing boring technology. <a href="https://boringtechnology.club/" rel="nofollow">https://boringtechnology.club/</a>

exabrialabout 2 years ago

评论 #35530118 未加载

jwmozabout 2 years ago

VincentEvansabout 2 years ago

rbutabout 2 years ago

coding123about 2 years ago

kevsimabout 2 years ago

SergeAxabout 2 years ago

munichaabout 2 years ago

tantalorabout 2 years ago

What does "maxi/maxis" mean in this context?Google search for [sql maxis] just returns this article.

habiburabout 2 years ago

评论 #35533503 未加载

SergeAxabout 2 years ago

say_it_as_it_isabout 2 years ago

user3939382about 2 years ago

Another middle ground is AWS Batch. If you don’t need like complicated/rules based on the outcome of the run etc it’s simpler, especially if you’re already used to doing ECS tasks.

gamednaabout 2 years ago

Side note, the amount of times that this article redundantly mentioned "Ditched RabbitMQ And Replaced It With A Postgres Queue" made me kinda sick.

0xbadcafebeeabout 2 years ago

macspoofingabout 2 years ago

How do you handle stale 'processing' jobs (i.e. jobs that were picked-up by a consumer but never finished - maybe because the consumer died)?

评论 #35532182 未加载

评论 #35529895 未加载

rawoke083600about 2 years ago

By holding on to msg for a long time,and then ack ? You basically trying to 'store state' in your message-pipline ?

phlakatonabout 2 years ago

评论 #35533036 未加载

sidcoolabout 2 years ago

What are the workers that the post mentions? Are these cron jobs?

estabout 2 years ago

dmtroyerabout 2 years ago

> took half a day to implement + testanyone else have trouble completely disregarding the whole of the article when they see things like this?

aphsalinaabout 2 years ago

What is Maxis?

northisupabout 2 years ago

Reticulating Splines?

评论 #35534220 未加载

j3th9nabout 2 years ago

Sooner or later they will have to deal with deadlocks.

tonymetabout 2 years ago

haartsabout 2 years ago

I didn't even know Postgres had a queue last year. I used it just for fun and it is GREAT. People using Kafka are kidding themselves.