It seems to me that the article doesn't contain anything to support the claim from the title or its own conclusion. It just recycles some elementary topics from the freshman year of some computer course.
I use a large Platform computational compute cluster often. We use individual processes (not threads) to crunch big data and in general that approach works well.<p>Each process has its own memory space and its own little bit of the work load to complete. One process can crash or throw and exception and the others keep on going. Having no shared streams or shared data containers to worry with (mutexes, locks, etc) is just wonderful.<p>We call it poor man's parallelism and some guys who have done a lot of threading make light of it. It's so simple (compared to threads) that it seems like a naive approach. But it performs so well that it's hard to argue with the results.
I think there's a big difference between designing an application using threads and using an existing thread-based API or system, like the article describes. If you write your own application-specific threads you can pick and choose from any concurrent design pattern you want. Using pre-existing multithreaded systems typically forces the programmer to use specific policies to interact with the system.
Another useless paper. The deep call stack problem exists in any kind of parallelism. And use multiple processes can be safer but the complexity is actually the same.
I dunno, I constrain my multithreaded processes to a single cpu sometimes and it does <i>seem</i> like multiple threads and multiple CPUs get me performance benefits. Maybe IBM knows something I don't.