I think there are two ways forward; larger L1 caches or split memory between cores, both have their problems.<p>To solve what this article argues about though, you can use my internet platform: <a href="https://github.com/tinspin/rupy" rel="nofollow">https://github.com/tinspin/rupy</a><p>Java has excellent concurrency and non-blocking implementations.<p>Since it also has the only class-loader that is worth using, it's the only programming language you can use for server-side joint parallel future proof systems today.