System design hack: Postgres is a great pub/sub and job server

852 点作者 colinchartier超过 5 年前

33 条评论

Another neat hack is to use Postgres as a quick & dirty replacement for Hadoop/MapReduce if you have a job that has big (100T+) input data but small (~1G) output data. A lot of common tasks fall into this category: generating aggregate statistics from large log files, searching Common Crawl for relevant webpages, identifying abusive users or transactions, etc.The architecture is to stick a list of your input shards in a Postgres table, have a state flag that goes PENDING->WORKING->FINISHED->(ERROR?), and then spin up a bunch of worker processes as EC2 spot instances that check for the next PENDING task, mark it as WORKING, pull it, process it, mark it as FINISHED, and repeat. They write their output back to the DB in a transaction; there's an assumption that aggregation can happen in-process and then get merged in a relatively cheap transaction. If the worker fails or gets pre-empted, it retries (or marks as ERROR) any shards it was previously working on.Postgres basically functions as the MapReduce Master & Reducer, the worker functions as the Mapper and Combiner, and there's no need for a shuffle phase because output <<< input. Almost all the actual complexity in MapReduce/Hadoop is in the shuffle, so if you don't need that, the remaining stuff takes < 1 hour to implement and can be done without any frameworks.

评论 #21487005 未加载

评论 #21487186 未加载

评论 #21489719 未加载

评论 #21491907 未加载

评论 #21487421 未加载

评论 #21491941 未加载

评论 #21488512 未加载

davidw超过 5 年前

> It's rarely a mistake to start with Postgres and then switch out the most performance critical parts of your system when the time comes.This is pretty good advice in general.

评论 #21491999 未加载

评论 #21487954 未加载

pjungwir超过 5 年前

I've done this before with good results.I was pleased to see they are using `SELECT FOR UPDATE SKIP LOCKED`. That is what this 2nd Quadrant article recommends, which I think is required reading if you want to implement this yourself:<a href="https://www.2ndquadrant.com/en/blog/what-is-select-skip-locked-for-in-postgresql-9-5/" rel="nofollow">https://www.2ndquadrant.com/en/blog/what-is-select-skip-lock...</a>It goes into more detail about wrong ways to implement a queue and what the downsides are for its preferred approach.

jordic超过 5 年前

We use a lot this kind of tooling.. say, you need to check 20k URLs and you want to rate limit them.. add them to a Pg table (with state and result fields). A single thread worker that just takes a row (marks it as pending) and later updates it. With select for update and skip tricks you can horitzontal scale it to the number of workers you need.I had seen it also for soft that sends massmail (our case around 100k/day).. it's state is a postgres queue.We also use Pg for transactional mail. We insert it on a table. (There is a process that sends the row mails).. the so nice part is that the mail is joining the dB transaction for free.. (all or nothing)

evv超过 5 年前

Excellent design hack. If anybody in the Node/TypeScript ecosystem is looking for this capability in a neat and supported library, it looks like the graphile folks have you covered:<a href="https://github.com/graphile/worker" rel="nofollow">https://github.com/graphile/worker</a>

评论 #21489347 未加载

zitterbewegung超过 5 年前

Postgres is an acceptable relational database / nosql database / pub/ sub/ job server / blockchain.

评论 #21486436 未加载

评论 #21489471 未加载

zrail超过 5 年前

If you're working with Ruby I have had good experiences with Que[1], which implements a pattern similar to the OP using advisory locks.[1]: <a href="https://github.com/que-rb/que" rel="nofollow">https://github.com/que-rb/que</a>

评论 #21485823 未加载

评论 #21485549 未加载

评论 #21485796 未加载

评论 #21489918 未加载

soumyadeb超过 5 年前

Geat post!! At Rudder (open-source segment), we used Postgres as our streaming layer in a similar fashion. It has been super stable in production, very easy to setup and we easily get to > 10K events/sec insertion performance.The code is open-sourced here in case anyone wants to re-use<a href="https://github.com/rudderlabs/rudder-server/blob/master/jobsdb/jobsdb.go" rel="nofollow">https://github.com/rudderlabs/rudder-server/blob/master/jobs...</a>We had to built additional logic to clean-up old jobs (similar to Level merges in similar queing systems)

fizwhiz超过 5 年前

Isn't hijacking a DB as a "distributed" message queue a pretty well trodden path? Enterprises have been doing this for decades.

评论 #21485250 未加载

评论 #21485255 未加载

silasdavis超过 5 年前

The postgres module for node is quite unreliable when holding open a connection to listen on a channel. This helped: <a href="https://github.com/andywer/pg-listen/blob/master/README.md" rel="nofollow">https://github.com/andywer/pg-listen/blob/master/README.md</a>But we still see issues.

cle超过 5 年前

At the scale I operate at, I wouldn't consider this a viable option. What's the backpressure like on NOTIFY/LISTEN? (Docs mention a maximum backlog of 8GB on the message queue, is that configurable? Monitorable?) Tons of constant churn on a table means we have to worry about vacuuming right? Now I have to monitor that too to make sure it's keeping up. Not to mention all the usual operational issues with running relational databases.No thanks, I'll stick with GCP PubSub or AWS SQS, which are explicitly designed for this use case, and for which I have to setup no infrastructure.

评论 #21488627 未加载

评论 #21488217 未加载

评论 #21487011 未加载

stephenr超过 5 年前

I'm not sure I quite follow this statement:> In the list above, I skipped things similar to pub/sub servers called "job queues" - they only let one "subscriber" watch for new "events" at a time, and keep a queue of unprocessed events:If your job queue only allows one single worker (even per named queue), I'd argue it's a shit job queue.

评论 #21490660 未加载

fyp超过 5 年前

Can someone share their experience with scaling pg's NOTIFY and LISTEN?The use case I have in mind has a lot of logically separate queues. Is it better for each queue to have its own channel (so subscribers can listen to only the queue they need) or have all queues notify a global channel (and have subscribers filter for messages relevant to them). I am mainly confused about whether I need a dedicated db connection per LISTEN query and also how many channels is too much.

评论 #21489706 未加载

评论 #21485666 未加载

londons_explore超过 5 年前

Hacks like this work at first, but long running transactions and postgres don't do well together.After a few weeks of running on a multi-TB table, you'll find the dead tuples massively outnumber the live tuples, and database performance is dropping faster than the Vacuum can keep up. Vacuum is inherently single-threaded, while your processes making dead tuples as part of queries are multi-threaded, so it's obviously the vacuum that fails first if most queries are long running and touch most rows. Your statistics will get old because they're also generated by the vacuum process, making everything yet slower.Even if you can live with gradually dropping performance, eventually the whole thing will fail when your txids wrap around and the whole database goes read-only.

评论 #21490314 未加载

directionless超过 5 年前

Postgres generally has a fairly low maximum connections. If you're running your own servers, you can adjust this, but in the cloud you may not be able to. For example, Google CloudSQL maxes at 1000, Heroku at 500.At that point, people usually start looking at the connection pooling tools. Depending on how much work you need from the DB, connections pools can be a win. Anyone know how connection pooling works with listeners?A

评论 #21486497 未加载

评论 #21488911 未加载

评论 #21487075 未加载

z3t4超过 5 年前

With today's hardware I argue you most likely do not need a pub/sub service. Considering the extra work needed in a distributed system, you could save a lot of time by keeping it simple.

luord超过 5 年前

Postgres is now pretty much the ultimate nearly-all purpose backend, it seems. At this point, I won't need to use anything else.And I'm more than perfectly fine with that.

zyngaro超过 5 年前

Keep in mind that a notify performed when nobody is listening is lost. The workers need then to catch up.

评论 #21486729 未加载

anonu超过 5 年前

How do you performance tune PostgreSQL on AWS and still keep it running under a reasonable cost?

评论 #21489499 未加载

Iv超过 5 年前

I'll soon have to do a pub/sub for an application that's close to a multiplayer video game.Most advices I have seen say that I'll probably want to code it myself, but I was wondering about the latency of that solution? I'll likely have a SQL store and that would be a good argument to use postgres...

评论 #21489353 未加载

评论 #21489470 未加载

评论 #21506435 未加载

评论 #21489200 未加载

2bitencryption超过 5 年前

I'm curious if the same holds true if you drop in Sqlite/MS Sql Server/Mysql.I.e. is this good advice because Postgres in particular is a great implementation of sql, or because sql in general is good enough to solve this problem, or a mix of the two?

评论 #21486244 未加载

评论 #21486215 未加载

tapirl超过 5 年前

Should the "ci_job_status" in the line be "ci_jobs" instead?<pre><code> INSERT INTO ci_job_status(repositor ... </code></pre> BTW, it looks Go becomes so popular that many tutorials are using Go for examples. ;D

namanyayg超过 5 年前

I've simply been using MySQL/Maria everywhere and am meaning to switch, but I'm not sure what makes Postgres this much better.Can this hack not be achieved by a mariadb table too?

评论 #21490533 未加载

aarbor989超过 5 年前

I did something very similar with MySQL since that was the DB we already had setup for data (not my choice). Basically any database that has atomic operations can do this. It definitely is way more convenient and cheaper to do use your existing infra for pub/sub and then only scale out to other services once performance becomes subpar. Although if you already have a messaging service up and running, it’s probably better to use that

klagermkii超过 5 年前

Thanks, seeing that atomic fetch in action is very useful.

webscalist超过 5 年前

if you use Postgres on AWS, you'll easily exhaust IOPS with this and hell breaks

DevKoala超过 5 年前

I agree. I have prototyped this in the past, but our current pub/sub was not painful enough for us to go full steam ahead with PG.However, my design was more bare bones. I was picking jobs by chaining CTE's that did the status update as they returned the first element of the queue.

rco8786超过 5 年前

Interesting enough, but I’m not sure why this is called a hack. It’s a fully supported feature of Pg

xmly超过 5 年前

Kinesis is based on DynamoDB...Key-value stores could do a lot of things theoretically.

评论 #21487443 未加载

rantwasp超过 5 年前

<a href="http://mikehadlow.blogspot.com/2012/04/database-as-queue-anti-pattern.html" rel="nofollow">http://mikehadlow.blogspot.com/2012/04/database-as-queue-ant...</a>

评论 #21488991 未加载

fouc超过 5 年前

Another neat hack would be to use Postgres as a graph db

评论 #21491796 未加载

foou超过 5 年前

How do you manage the bloat?

评论 #21489017 未加载

ubu7737超过 5 年前

Used the "FOR UPDATE SKIP LOCKED LIMIT 1" trick to implement a job server in PG a few years ago for the first time.It's a great solution.

评论 #21485872 未加载

评论 #21486537 未加载

评论 #21486160 未加载

评论 #21488598 未加载