TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Which meassage passing system for distributed computing?

5 pointsby h0h0over 13 years ago
We have been assigned the task of designing (and implementing) a framework for distributed optimization in the context of inverse modelling.<p>The target computing env. are both GPU clusters (infiniband) as well as other distributed systems (high latency &#38; prone to error).<p>As for the optimization algorithms it will support both derivative free methods (e.g. VXQR1 or GA) as well as some variation of Gradient Descent.<p>As we would like the framework to be fault-tolerant MPI is not an option(?).<p>What message system would be appropriate - I was thinking of 0mq, but I am getting mixed reactions from the experts.

1 comment

asharpover 13 years ago
Distributed optimisation is interesting.<p>Remember that you can take node failure, as long as you have a copy of your state stored somewhere that you can restart from when required. The overall product isn't particularly effected from that.<p>0mq is probably what you want if you're implementing this the way i'm expecting you to. ie, some variation on: take current state, multicast that to all of the machines doing the optimisation take results, unicast back to a machine to determine the best state to seed the next round, repeat.<p>0mq over infiniband will iirc use the native infiniband multicast groups which is very efficient.<p>Keep snapshots of each iteration, and all should be good.<p>But like all things, it depends entirely on the specifics of what you're trying to do. Send me an email, you seem to have an interesting problem.