RQ – Simple Job Queues for Python

222 pointsby giladover 5 years ago

25 comments

pselbertover 5 years ago

Redis is brilliant for simple job queues but it doesn’t have the structures for more advanced features. Things like scheduled jobs can be done through sorted sets and persistent jobs are possible by shifting jobs into backup queues, but it is all a bit fragile. Streams, available in 5+ can handle a lot more use cases fluently, but you still can’t get scheduled jobs in the same queue.After replicating most of Sidekiq’s pro and enterprise behavior using older data structures I attempted to migrate to streams. What I discovered is that all the features I really wanted were available in SQL (specifically PostgreSQL). I’m not the first person to discover this, but it was such a refreshing change.That led me to develop a Postgres based job professor in Elixir: <a href="https://github.com/sorentwo/oban" rel="nofollow">https://github.com/sorentwo/oban</a>All the goodies only possible by gluing Redis structures together through lua scripts were much more straightforward in an RDBMS. Who knows, maybe the recent port of disque to a plug-in will change things.

评论 #21941611 未加载

评论 #21941453 未加载

评论 #21944453 未加载

评论 #21941908 未加载

评论 #21943872 未加载

评论 #21945539 未加载

评论 #21946823 未加载

andrewstuartover 5 years ago

Reposting from a while back in case it solved a problem for someone.I use Postgres SKIP LOCKED as a queue. Postgres gives me everything I want. I can also do priority queueing and sorting.All the other queueing mechanisms I investigated were dramatically more complex and heavyweight than Postgres SKIP LOCKED.Here is a complete implementation - nothing needed but Postgres, Python and psycopg2 driver:<pre><code> import psycopg2 import psycopg2.extras import random db_params = { 'database': 'jobs', 'user': 'jobsuser', 'password': 'superSecret', 'host': '127.0.0.1', 'port': '5432', } conn = psycopg2.connect(**db_params) cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor) def do_some_work(job_data): if random.choice([True, False]): print('do_some_work FAILED') raise Exception else: print('do_some_work SUCCESS') def process_job(): sql = """DELETE FROM message_queue WHERE id = ( SELECT id FROM message_queue WHERE status = 'new' ORDER BY created ASC FOR UPDATE SKIP LOCKED LIMIT 1 ) RETURNING *; """ cur.execute(sql) queue_item = cur.fetchone() print('message_queue says to process job id: ', queue_item['target_id']) sql = """SELECT * FROM jobs WHERE id =%s AND status='new_waiting' AND attempts <= 3 FOR UPDATE;""" cur.execute(sql, (queue_item['target_id'],)) job_data = cur.fetchone() if job_data: try: do_some_work(job_data) sql = """UPDATE jobs SET status = 'complete' WHERE id =%s;""" cur.execute(sql, (queue_item['target_id'],)) except Exception as e: sql = """UPDATE jobs SET status = 'failed', attempts = attempts + 1 WHERE id =%s;""" # if we want the job to run again, insert a new item to the message queue with this job id cur.execute(sql, (queue_item['target_id'],)) else: print('no job found, did not get job id: ', queue_item['target_id']) conn.commit() process_job() cur.close() conn.close()</code></pre>

评论 #21944476 未加载

评论 #21942760 未加载

评论 #21942862 未加载

andybakover 5 years ago

Side niggle - I used to notice a lot of Django projects would use complex job queues for absurdly low workloads. Beginners would get recommendations to use RabbitMQ and Redis for sites that probably were only going to see a few hundred concurrent users at most.Seriously - don't add complex dependencies to your stack unless you need them. The database makes a great task queue and the filesystem makes a great cache. You really might not need anything more.

评论 #21941442 未加载

评论 #21943202 未加载

评论 #21941829 未加载

评论 #21942129 未加载

elamjeover 5 years ago

I currently use RQ. Here is the logic that lead me to choosing it, then wishing I had just used Postgres.I need a queue to handle long running jobs. I looked around the Python ecosystem (bc Flask app) and found RQ. So now I add code to run RQ. Then I add Redis to act as the Queue. Then I realized I needed to track jobs in the queue, so I put them into the DB to track their state. Now I effectively have a circular dependency between my app, Redis, and my Postgres DB. If Redis goes down, I’m not really sure what happens. If the DB goes down I’m not really sure what’s going on in Redis. This added undue complexity to my small app. Since I’m trying to keep it simple, I recently found that you can use Postgres as a pub/sub queue which would have completely solved my needs while making the app much easier to reason about. Using Postgres will have plenty of room to grow and buy you time to figure out a more durable solution.

评论 #21946898 未加载

评论 #21945163 未加载

kureikainover 5 years ago

I got burned by this with RQ.Say you have an object, the object load some config from a settings module which in turn fetch from env. No matter how many times I restarted RQ, the config won't changed. Due to the code changes, the old config cause the job to crash and keep retrying.Until I got frustrated and get into Redis, pop the job, and yet all the setting was in there. In other words, RQ serialize the whole object together with all properties.RQ isn't that good IMHO. You will have to add monitoring, health check, node ping, scheduler, retrying, middleware like Celery eventually it grow into a home grown job queue system that make it harder to on board new devs.Just use Celery. Celery isn't that bloated. It has many UI to support it and backend and very flexible. Celery beat schedule is great as well.

_verandaguyover 5 years ago

I used this in a project at a previous job -- I have to say, while the API is simple and useful enough for small projects, it raises some issues with how it's designed.Instead of relying on trusted exposed endpoints and just invoking them by URL, it does a bytecode dump of task functions and stores those in Redis before restoring them from bytecode at execution time.This has a few drawbacks:- Payloads in the queue are potentially a fair bit larger for complex jobs - Serialization for stuff that has decorators (and especially stateful one, like `lru_cache`) is not really possible, even with `dill` instead of `pickle` - It's not trivial, but this exposes a different set of security risks compared to the alternativeI don't want to say this is a bad piece of software, it's super easy to set up and way more lightweight than Celery for example, but it's not my tool of choice having worked with the alternatives.

评论 #21946988 未加载

评论 #21942250 未加载

评论 #21941804 未加载

harikbover 5 years ago

May I ask anyone posting stats on "millions per day" please indicate how many nodes/cpus the entire system uses. For example "8.6 million per day" is only "100 per second". If that takes 100 cpus.... folks are underestimating what a single node modern cpu/network card is capable.

评论 #21953052 未加载

jholloway7over 5 years ago

Not sure if lower-level API of RQ supports this, but I tend to prefer message-oriented jobs that don't couple the web app to the task handler like the example.I don't want to import "count_words_at_url" just to make it a "job" because that couples my web app runtime to whatever the job module needs to import even though the web app runtime doesn't care how the job is handled.I want to send a message "count-words" with the URL in the body of the message and let a worker pick that up off the queue and handle it however it decides without the web app needing any knowledge of the implementation. The web app and worker app can have completely different runtime environments that evolve/scale independently.

评论 #21941273 未加载

评论 #21941287 未加载

nerdbaggyover 5 years ago

I am a big fan of RQ along with Huey (<a href="https://github.com/coleifer/huey" rel="nofollow">https://github.com/coleifer/huey</a>)

评论 #21945391 未加载

评论 #21941463 未加载

评论 #21941421 未加载

评论 #21944184 未加载

评论 #21940813 未加载

评论 #21941117 未加载

评论 #21946536 未加载

wakatimeover 5 years ago

We tried this at WakaTime and it didn't scale well. We also tried <a href="https://github.com/closeio/tasktiger" rel="nofollow">https://github.com/closeio/tasktiger</a> but the only ones that work for us are based on RabbitMQ:Celery<4.0<a href="https://github.com/Bogdanp/dramatiq" rel="nofollow">https://github.com/Bogdanp/dramatiq</a>

评论 #21942386 未加载

sleaveyover 5 years ago

I've not much to say other than that this is a great little library! I used RQ for a small project recently and found it to be pretty easy to use. As my project grew bigger I also found it contained extra features I didn't realise I'd need when I started.

评论 #21941208 未加载

评论 #21940814 未加载

mperhamover 5 years ago

Most job systems like Celery or Sidekiq are language-specific. If you are looking for background jobs for any language, check out Faktory.More advanced features like queue throttling and complex job workflows are available.<a href="https://github.com/contribsys/faktory/wiki" rel="nofollow">https://github.com/contribsys/faktory/wiki</a>

wryunover 5 years ago

<a href="https://dramatiq.io/motivation.html" rel="nofollow">https://dramatiq.io/motivation.html</a>

wojcikstefanover 5 years ago

Shameless plug: Our team at Close has been inspired by RQ when we created TaskTiger – <a href="https://github.com/closeio/tasktiger" rel="nofollow">https://github.com/closeio/tasktiger</a>. We’ve been running it in production for a few years now, processing ~4M tasks a day and it’s been a wonderful tool to work with. Curious to hear what y’all think!

评论 #21941563 未加载

评论 #21942226 未加载

评论 #21941297 未加载

iddanover 5 years ago

RQ is horrible. We’ve used in K Health and migrated to Celery as configuration, optimisation and monitoring where all really hard with RQ. It takes ten seconds to get started but days to get to production. Not a good trade off!

评论 #21944364 未加载

mattbillensteinover 5 years ago

I've used rq in prod at a couple places - nice little library.I really like the design of beanstalkd and I used that at one place, but using rq + redis was one less thing to deploy and/or fail.

nijaveover 5 years ago

Having used resque how production workloads I wouldn't want to use Redis as a job queue. It works fine for small workloads but doesn't have many resiliency capabilities and doesn't scale well (cluster mode drops support for some set operations so probably can't use that). Replication is async and it's largely single threaded so then you get this SPOF bottleneck in the middle of a distributed work system

评论 #21947595 未加载

adamcharnockover 5 years ago

Different to RQ – but related - I recently released Lightbus [1] which is aimed at providing simple inter-process messaging for Python. Like RQ it is also Redis backed (streams), but differs in that is a communication bus rather than a queue for background jobs.[1]: <a href="https://lightbus.org" rel="nofollow">https://lightbus.org</a>

akxover 5 years ago

We were unsatisfied with Rq (I forget which parts, but I seem to recall there was lots of code that strictly wasn't needed) and wrote <a href="http://github.com/valohai/minique" rel="nofollow">http://github.com/valohai/minique</a> instead.For less simple use cases, just use Celery.

sdanover 5 years ago

I’ve used RQ in production applications just last week. It’s pretty basic so there’s upsides and downsides, but so far it simply works! I May opt in for celery down the line but for the state of my project, rq helped me ship quick and iterate.

alexnewmanover 5 years ago

We use <a href="https://github.com/josiahcarlson/rpqueue" rel="nofollow">https://github.com/josiahcarlson/rpqueue</a> and I absolutely love it

orfover 5 years ago

Just use Celery. I know several teams who made the, in hindsight, very poor choice of using RQ. By the time you realize it’s a bad decision it’s very hard to get out of.Unless your use case is absurdly simple, and will always + forever be absurdly simple, then celery will fit better. Else you find yourself adding more and more things on top of RQ until you have a shoddy version of Celery. You can also pick and choose any broker with Celery, which is fantastic for when you realize you need RabbitMQ,

评论 #21942565 未加载

tschellenbachover 5 years ago

Celery + RabbitMQ is a great option for Python projects

matisoffnover 5 years ago

Like Sidekiq but for Python. Neat.

ankut04over 5 years ago

For python, celery seems good.