A useful read for the aspiring parallel programmer :)<p>I got interested in the topic at university and spent a lot of time in my final year thesis un-learning all my sequential logic.<p>One of the biggest traps I fell into was "setup, do work, tear down" philosophy. That's the usual progression of sequential logic into parallism (as in set up your parallel threads with data, fire them off, then hang about waiting to combine the results and tear down any constructs). In my tests I found it was <i>better</i> (both in terms of speed and sanity) to give each parallel segment of a program (be it a thread, a process, a computing node) as much autonomy as you can. The wrapper around all of that, the stuff that splits your logic up has to be damn clever - because portioning code on the fly is complex - but what it actually achieves shouldnt be all that much. Recieve some logic, chunk it, farm it out. Done.
A little anecdote. A friend of mine who's working on his Ph.D at CMU sped up his physics group's simulations by an order of magnitude by simply running the same code multiple times on multiple machines. The problem was that the choice of initial random seeds could nontrivially effect the convergence time. There are examples of this in generic programming. Look at quicksort for example. Picking the right pivot can matter a great deal.