Celery – Best Practices

174 pointsby denibertovicalmost 11 years ago

21 comments

xenatoralmost 11 years ago

In many projects Celery is overkill. Common scenario I saw:<pre><code> 1. We have problem, lets use Celery 2. Now we have one more problem. </code></pre> I found <a href="http://python-rq.org/" rel="nofollow">http://python-rq.org/</a> much more handy and cover most cases. It uses redis as query broker. Flask, Django integration included <a href="https://github.com/mattupstate/flask-rq/" rel="nofollow">https://github.com/mattupstate/flask-rq/</a> <a href="https://github.com/ui/django-rq" rel="nofollow">https://github.com/ui/django-rq</a>

评论 #7909716 未加载

评论 #7909722 未加载

评论 #7909634 未加载

评论 #7910621 未加载

评论 #7913671 未加载

评论 #7912384 未加载

mickeypalmost 11 years ago

Good, basic practices to follow. Here's a few more:- If you're using AMQP/RabbitMQ as your result back end it will create a lot of dead queues to store results in. This can easily overwhelm your RabbitMQ server if you don't clear these out frequently. Newer releases of Celery will do this daily I think - but it's worth keeping in mind if your RMQ instance falls over in prod.- Use chaining to build up "sequential" tasks that need doing instead of calling one after another in the same task (or worse, doing a big mouthful of work) in one task as Celery can prioritise many tasks better than synchronously calling several tasks in a row from one "master" task.- Try to keep a consistent module import pattern for celery tasks, or explicitly name them, as Celery does a lot of magic in the background so task spawning is seamless to the developer. This is very important as you should never mix relative and absolute importing when you are dealing with tasks. from foo import mytask may be picked up differently than "import foo" followed by "foo.mytask" would resulting in some tasks not being picked up by Celery(!)- Never pass database objects, as OP says, is true; but go one step further and don't pass complex objects at all if you can avoid it. I vaguely remember some of the urllib/httplib exceptions in Python not being serializable and causing very cryptic errors if you didn't capture the exception and sanitise it or re-raise your own.- Use proper configuration management to set up and configure Celery plus what ever messaging broker/backend. There's nothing more frustrating than spending your time trying to replicate somebody's half-assed Celery/Rabbit configuration that they didn't nail down and test properly in a clean-room environment.

评论 #7909614 未加载

评论 #7913627 未加载

评论 #7909517 未加载

sylvinusalmost 11 years ago

I've worked 4+ years with Celery on 3 different projects and found it incredibly difficult to manage, both from the sysadmin and the coder point of view.With that experience, we wrote a task queue using Redis & gevent that puts visibility & tooling first: <a href="http://github.com/pricingassistant/mrq" rel="nofollow">http://github.com/pricingassistant/mrq</a>Would love to have some feedback on that!

评论 #7912044 未加载

评论 #7913757 未加载

评论 #7910404 未加载

waffle_ssalmost 11 years ago

I disagree with the characterization in #1 (although I can't speak to the Celery particulars). I feel like if you have a job that is critical to your business process, the job should be persisted to your database and created within the same database transaction as whatever is kicking off the job.Consider how background jobs are typically managed with RabbitMQ, Redis, etc. They are usually created in an "after commit" hook from whatever gets persisted to your relational database. In this scenario, there is a gap between the database transaction being committed and the job being sent to and persisted by RabbitMQ or Redis; during this gap the only record of that task is being held in a process's memory.If this process gets killed suddenly during this gap, that background job will be lost forever. It sounds unlikely, but if RabbitMQ or Redis is down and the process has to sit and retry, waiting for them to come back online, the gap can be sizable.

评论 #7910473 未加载

评论 #7910516 未加载

misiti3780almost 11 years ago

I would add:1. Use task specific logging if you have a bunch of task: <a href="http://blog.mapado.com/task-specific-logging-in-celery/" rel="nofollow">http://blog.mapado.com/task-specific-logging-in-celery/</a>2.Use statsd counters to keep track of basic statistics (counts + timers) for each task3. Use supervisor + monit to restart workers after lack of activity (I have seen this happen a few times, but never been able to track down why it happens, but this is an easy fix)

评论 #7909748 未加载

ehurrellalmost 11 years ago

Excellent resource, I remember wrestling with learning celery and how to do some simple things, loved finding Flower to monitor things.I will say though Celery is probably overkill for a lot of tasks people think to use it for, in my case it was mandated to support scaling for a startup that never launched, partly because they kept looking at new technologies for problems they didn't have yet.

keosakalmost 11 years ago

Points 1 and 2 are only valid because the Celery database backend implementation uses generic SQLAlchemy. Chances are, if you are using a relational database, it's PostgreSQL. And it does have an asynchronous notification system (LISTEN, NOTIFY), and this system allows you to specify which channel to listen/notify on.With the psycopg2 module, you can use this mechanism together with select(), so your worker thread(s) don't have to poll at all. They even have an example in the documentation.<a href="http://www.postgresql.org/docs/9.3/interactive/sql-notify.html" rel="nofollow">http://www.postgresql.org/docs/9.3/interactive/sql-notify.ht...</a><a href="http://initd.org/psycopg/docs/advanced.html#async-notify" rel="nofollow">http://initd.org/psycopg/docs/advanced.html#async-notify</a>

评论 #7909971 未加载

评论 #7910262 未加载

评论 #7915559 未加载

TwistedWeaselalmost 11 years ago

Once you scale your worker pool up beyond a couple of machines you need some sort of config management with Celery. We use SaltStack to manage a large pool of celery workers and it does a pretty good job.

评论 #7910752 未加载

TomaszZielinskialmost 11 years ago

This is not a Celery-specific tip, but as Celery also likes to "tweak" your logging configuration you can use <a href="https://pypi.python.org/pypi/logging_tree" rel="nofollow">https://pypi.python.org/pypi/logging_tree</a> to see what's going on under the hood.

评论 #7911664 未加载

评论 #7911162 未加载

geertjalmost 11 years ago

I've been looking at Python tasks queues recently. Does anyone have experience on how Celery and rq stack up?Rq is a lot smaller, more than 10x by line count. So if it works just as well, I'd go with the simpler implementation.

评论 #7909671 未加载

评论 #7914692 未加载

评论 #7911525 未加载

zentrusalmost 11 years ago

Passing objects to Celery and not querying for fresh objects is not always a bad practice. If you have millions of rows in your database, querying for them is going to slow you way down. In essence, the same reason you shouldn't use your database as the Celery backend is the same reason you might not want to query the database for fresh objects. It depends on your use case of course. Passing straight values/strings should be strongly considered too since serializing and passing whole objects when you only need a single value is not good either.

评论 #7911951 未加载

TomaszZielinskialmost 11 years ago

If you combine Celery with supervisord it's important to check the official config file[1]. At least two settings there are really important - `stopwaitsecs=600` and `killasgroup=true`. If you don't use them you might end up with a bunch of orphaned child Celery processes and your tasks might be executed more than once.[1] <a href="https://github.com/celery/celery/blob/ee46d0b78d8ffc068d5b80e9568a5a050c61d1a8/extra/supervisord/celeryd.conf#L18" rel="nofollow">https://github.com/celery/celery/blob/ee46d0b78d8ffc068d5b80...</a>

Eric_WVGGalmost 11 years ago

Am I the only person who was genuinely disappointed that this wasn’t about the vegetable?It’s a sadly under-rated ingredient! The flavor is subtle but unmistakable.

评论 #7910531 未加载

评论 #7912024 未加载

oulipoalmost 11 years ago

Wondering about something: if you need to have a long task (5s to 10s) in the background, or even longer, for an AJAX request, what should you rather do:- use gevent + gunicorn, or Tornado, in order to keep a socket open while the worker is processing the task?- use polling? (less efficient)- use websockets (but then the implementation is perhaps a bit more complex)can you do this simply using Flask?

评论 #7909828 未加载

评论 #7913054 未加载

评论 #7910619 未加载

评论 #7913112 未加载

评论 #7909943 未加载

mataugalmost 11 years ago

What about using Redis as a celery backend ? Redis has a pub sub mechanism which seems quite reliable, so no need to poll.

评论 #7910732 未加载

评论 #7909546 未加载

评论 #7909619 未加载

harlowjaalmost 11 years ago

As one the authors of taskflow I'd like to give a little shout-out for its usage (since it can do similar things as celery, hopefully more elegantly and easily).Pypi: <a href="https://pypi.python.org/pypi/taskflow" rel="nofollow">https://pypi.python.org/pypi/taskflow</a>Comments, feedback and questions welcome :-)

stickpersonalmost 11 years ago

I've heard so much about Celery but still have no clue when it would be used. Could someone give some specific examples of when you have used it? I don't really even know what a distributed task is.

评论 #7911984 未加载

DrJalmost 11 years ago

I'd also add: Be wary of context dependent actions (e.g. render_template, user.set_password, sign_url, base_url) as you aren't in the application/request context inside of a celery task.

peedyalmost 11 years ago

Has anybody been able to make a priority queue (with a single worker) in celery?Eg, execute other tasks only if there are no pending important tasks.

评论 #7968111 未加载

评论 #7918659 未加载

zrailalmost 11 years ago

Small typo where you define `CELERY_ROUTES`. `my_taskA` should probably have the routing key `for_task_A`, right?

评论 #7909500 未加载

stefantalpalarualmost 11 years ago

> when you have a proper AMQP like RabbitMQAMPQ = Advanced Message Queuing Protocol so it's wrong to say that a message broker is "an AMQP". Also, give Redis a try - it's much easier to set up and uses fewer resources.We should probably talk about the elephant in the room when addressing newbies: the Celery daemon needs to be restarted each time new tasks are added or existing ones are modified. I got past that with the ugly hack of having only one generic task[1] but people new to Celery need to know what they're getting into.[1]: <a href="https://github.com/stefantalpalaru/generic_celery_task" rel="nofollow">https://github.com/stefantalpalaru/generic_celery_task</a>

评论 #7942842 未加载

评论 #7909698 未加载

21 comments

xenatoralmost 11 years ago

评论 #7909716 未加载

评论 #7909722 未加载

评论 #7909634 未加载

评论 #7910621 未加载

评论 #7913671 未加载

评论 #7912384 未加载

mickeypalmost 11 years ago

评论 #7909614 未加载

评论 #7913627 未加载

评论 #7909517 未加载

sylvinusalmost 11 years ago

评论 #7912044 未加载

评论 #7913757 未加载

评论 #7910404 未加载

waffle_ssalmost 11 years ago

评论 #7910473 未加载

评论 #7910516 未加载

misiti3780almost 11 years ago

评论 #7909748 未加载

ehurrellalmost 11 years ago

keosakalmost 11 years ago

评论 #7909971 未加载

评论 #7910262 未加载

评论 #7915559 未加载

TwistedWeaselalmost 11 years ago

评论 #7910752 未加载

TomaszZielinskialmost 11 years ago

评论 #7911664 未加载

评论 #7911162 未加载

geertjalmost 11 years ago

评论 #7909671 未加载

评论 #7914692 未加载

评论 #7911525 未加载

zentrusalmost 11 years ago

评论 #7911951 未加载

TomaszZielinskialmost 11 years ago

Eric_WVGGalmost 11 years ago

Am I the only person who was genuinely disappointed that this wasn’t about the vegetable?It’s a sadly under-rated ingredient! The flavor is subtle but unmistakable.

评论 #7910531 未加载

评论 #7912024 未加载

oulipoalmost 11 years ago

评论 #7909828 未加载

评论 #7913054 未加载

评论 #7910619 未加载

评论 #7913112 未加载

评论 #7909943 未加载

mataugalmost 11 years ago

What about using Redis as a celery backend ? Redis has a pub sub mechanism which seems quite reliable, so no need to poll.

评论 #7910732 未加载

评论 #7909546 未加载

评论 #7909619 未加载

harlowjaalmost 11 years ago

stickpersonalmost 11 years ago

I've heard so much about Celery but still have no clue when it would be used. Could someone give some specific examples of when you have used it? I don't really even know what a distributed task is.

评论 #7911984 未加载

DrJalmost 11 years ago

I'd also add: Be wary of context dependent actions (e.g. render_template, user.set_password, sign_url, base_url) as you aren't in the application/request context inside of a celery task.

peedyalmost 11 years ago

Has anybody been able to make a priority queue (with a single worker) in celery?Eg, execute other tasks only if there are no pending important tasks.

评论 #7968111 未加载

评论 #7918659 未加载

zrailalmost 11 years ago

Small typo where you define `CELERY_ROUTES`. `my_taskA` should probably have the routing key `for_task_A`, right?

评论 #7909500 未加载

stefantalpalarualmost 11 years ago

评论 #7942842 未加载

评论 #7909698 未加载