Back in around 2006 I was doing GPGPU regex and string matching and decided to google around for parallel string matching algorithms. I spent a good chunk of time scrolling around in the Vishkin's algorithm implementation and wondering why it all felt so familiar; it was only when I scrolled back to the top of the file that I realized it was familiar because <i>I wrote it</i>.<p>A fine example of why you should comment your code, as the person who reads it in 10 years from now will be a stranger, even if that person is you.
There is also the CMU free draft book on Parallel algorithms design
<a href="http://www.parallel-algorithms-book.com/" rel="nofollow">http://www.parallel-algorithms-book.com/</a>
Animations link appears to be broken. After doing some way back machine snooping, looks like it hasn't been up since about 2000[0]. You can access the legacy java applet animations here[1]<p>[0] <a href="https://web.archive.org/web/20000611153404/http://web.scandal.cs.cmu.edu/cgi-bin/demo" rel="nofollow">https://web.archive.org/web/20000611153404/http://web.scanda...</a><p>[1] <a href="http://www.cs.cmu.edu/~scandal/applets/" rel="nofollow">http://www.cs.cmu.edu/~scandal/applets/</a>
These algorithms are written in NESL, which is perhaps the cleanest example of the oft-mentioned theory that parallel programming becomes simple in a functional setting. It really is remarkable how flexible it is. Unfortunately, I'm not sure NESL ever managed to run particularly fast in practice.
As someone who has done a lot of parallel programming, I really can't recommend Java 8 streams enough. They have a bit of overhead, however, they are incredibly easy to use and are extremely expressive.