To easily distribute jobs over a cluster.<p>Running right now:<p>I have a database of 100,000,000 molecules, and I need to compute the force field on all of them. So I put all of the IDs in the celery queue, and then start a worker across 6 nodes,<p>The job pulls the row from the database, computes the force field, and then stores the result.<p>Lots of jobs like this:
- I used to use it to tune hyperparameters of shallow models
- Used it to batch convert a bunch of compressed files from one format to another
- Use it with celery beat to schedule scraping of URLs
We have written distributed crawler using celery/Rabbitmq.<p>Analyzing social signals real time and it is processing more than 10 million requests everyday. On the way to scale it to process more than 50-100 million requests per day.<p>Other than that doing nlp tasks along with machine learning using topic modeling, ner etc.