Worth mentioning that the cluster tools in IPython implement basically the same system as he ends up with, plus lots of other functionality and it is using ZMQ as the backend as well. If you've got a small to medium sized task (I'd say less than 500 nodes is a good number) to run on your cluster ipcluster is pretty great. Even when I have a proper queuing system like SLURM or PBS in place, I still often use ipcluster on top.<p><a href="http://ipython.org/ipython-doc/dev/parallel/" rel="nofollow">http://ipython.org/ipython-doc/dev/parallel/</a>
ZeroMQ is amazing. A few years ago I built a prototype project for a client that basically was fail2ban but scalable. It monitored nginx logs and broadcasted some information, and workers did the rest. Most of the data was in memory and passed around, and communication was done via ZeroMQ. It was done this way so that we could split the heavy-load components off the server and into workers, and allow the server to simply do it's job: tail nginx logs and act upon ban requests from workers. It was amazing, sadly I never completed it and deployed it on a production environment but from initial tests, it outperformed fail2ban by a lot.
I had fun reading - thx. I found the solution of "requesting workers" interesting. My first reflex to the round robin argument was - a better load balancer but the requesting worker made that totally not necessary! I wounder what negative sites sutch architectures may have?
Nice article. I hope to see a benchmark of it on a less parallel problem with low-latency network hardware. Maybe a comparison of that against a vanilla MPI or parallel framework. We'll get to see what its real overhead is.
Great writeup. Perhaps slightly less clean, you could cut the messages in half by having the available report the result as well (empty in the initial case).<p>Is the worker idling between replying and receiving new work? In other words, does ZMQ enable overlapping, say always having two outstanding requests per worker?