Looks interesting. But shouldn't a library like celery work outside the context of a web framework? I don't see a reason to call this a distributed task queue 'for Django' specifically, except for the dependencies on Django's ORM and settings definitions. Swapping out Django's ORM with SQLAlchemy (or DB-API) would make this project much more useful.<p>See pp (<a href="http://www.parallelpython.com/" rel="nofollow">http://www.parallelpython.com/</a>) for something similar, without the django dependency. More parallel processing goodies at <a href="http://wiki.python.org/moin/ParallelProcessing" rel="nofollow">http://wiki.python.org/moin/ParallelProcessing</a>.
Having just hacked together an ugly threaded task queue for scraping and multi-stage data processing in Django, this looks like a breath of fresh air. I need to work my way out of the self-inflicted mess I've created.<p>Does anyone have experience with this library or anything similar?
beanstalkd (<a href="http://xph.us/software/beanstalkd/" rel="nofollow">http://xph.us/software/beanstalkd/</a>) also has similarities to this, and for non-Django / simpler needs, it may be better. It's basically memcached repurposed into a queue server.<p>A "task" would be equivalent to a script which only looks for jobs in a certain bucket (or "tube" as they're called). You can run as many clients on as many machines as you like. Obviously, since it is memory-based, you'll lose the queue in the event of a system crash.<p>That being said, as a rabid Django user, this is definitely going into my bookmarks!
I've been reading through the documentation on the celery github page. I haven't been able to figure out the appropriate task breakdown. That is, I'm trying to do some crawling and ingestion, and I'm wondering if I should be pushing a dozen small tasks onto the queue every second, or push larger tasks (possibly with subtasks broken out like it suggests) every minute or hour.<p>This sounds like a dumb question to my own ears, but I just don't have to familiarity to know the proper use case. I essentially want continuous crawling and ingestion with the potential to spread the load across multiple servers one day.<p>(presumably the ingestors would be populating local databases, with a query getting farmed out to each server+database, but I haven't figured that part out, either....ummm, sounds like a task I could put into the queue, as well. Are these things really nails?)<p>I'd be grateful if anyone can point me to some examples or provide a bit of context.
I have often wondered why not use a MySQL table as a "queue" (or more tables if needed). Basically, you get great performance (MySQL is really fast), you get great language support (a LOT of languages can add tasks via simple SQL) and you get such things like easy backups and replication.