1. Try process level I/O, such pipes, sockets, and the like. Have Linux deal with the concurrency problem, not you. (Note: the BASH & background job works in so many cases it ain't funny). Also try fork/join parallelism models like OpenMP. These are all far easier than dipping down to a lower level.<p>2. Try a mutex<p>3. If that doesn't work, try adding a condition variable.<p>4. If that still doesn't work, try an atomic in default sequentially consistent mode or equivalent (ex: Java volatile, InterlockedAdd, and the like). Warning: atomics are very subtle. Definitely have a review with an expert if you are here.<p>5. If that still doesn't work, consider lock free paradigms. That is, combinations of atomics and memory barriers.<p>6. If that still doesn't work, publish a paper on your problem lol.<p>---------<p>#1 is my most important piece of advice. There was a Blender render I was doing, like 2.6 or something old a few years ago. Blenders parallelism wasn't too good and only utilized 25% of my computer.<p>So I ran 4 instances of headless Blender. Bam, 100% utilization. Done.<p>Don't overthink parallelism. It's stupid easy sometimes, as easy as a & on the end of your shell command.
Related:<p><i>“Is Parallel Programming Hard, and, If So, What Can You Do About It?” v2 Is Out</i> - <a href="https://news.ycombinator.com/item?id=26537298" rel="nofollow">https://news.ycombinator.com/item?id=26537298</a> - March 2021 (75 comments)<p><i>Is parallel programming hard, and, if so, what can you do about it?</i> - <a href="https://news.ycombinator.com/item?id=22030928" rel="nofollow">https://news.ycombinator.com/item?id=22030928</a> - Jan 2020 (85 comments)<p><i>Is Parallel Programming Hard, and, If So, What Can You Do About It? [pdf]</i> - <a href="https://news.ycombinator.com/item?id=9315152" rel="nofollow">https://news.ycombinator.com/item?id=9315152</a> - April 2015 (31 comments)<p><i>Is Parallel Programming Hard, And, If So, What Can You Do About It?</i> - <a href="https://news.ycombinator.com/item?id=7381877" rel="nofollow">https://news.ycombinator.com/item?id=7381877</a> - March 2014 (26 comments)<p><i>Is Parallel Programming Hard, And, If So, What Can You Do About It?</i> - <a href="https://news.ycombinator.com/item?id=2784515" rel="nofollow">https://news.ycombinator.com/item?id=2784515</a> - July 2011 (39 comments)
Multi-threaded programming has been of particular interest to me for decades (since my early years programming for OS/2). Whenever I write code, I look for ways to do things in parallel.<p>My new data management system is highly parallel. I am always finding tasks that take minutes to complete and getting them down to just seconds (when running on multi-core CPUs) by getting multiple threads working together on the same problem.<p>Just yesterday, I found a task that was taking over 12 minutes to finish (inserting 125 million key/value pairs into a data store) and was able to get it to do the same task in just 37 seconds (running on my 16 core/32 thread CPU) by spinning off multiple threads.
Use an actor model language -- by far the sanest way. Message passing is intuitive to human experience.<p>1. Elixir (Erlang)<p>2. Scala/Akka<p>3. Pony
Section 2.3.3 is definitely worth reading carefully.<p>I've both increased efficiency and removed bugs by rewriting a system that someone thought the only way to make faster was to add more threads, when optimising algorithms and data layout resulted in much more gains.
Its interesting that while Moore's law saturated many years ago there is still no parallel programming style that hits some sweet spot between productivity and performance for multicore cpus (and thus gets more adopted for mainstream development)<p>Its not clear if this means there is not such "optimum" or simply it is not something anybody cares about<p>people focused a lot on gpus but thats not easy either
I use OS threads + non-blocking IO with concurrent package for shared data in Java. The performance is incredible.<p>If I wanted to get a little more performance per watt I would probably rewrite it in C with arrays of atomic variables.<p>But you need a VM with GC to be able to be productive during the day and sleep at night, so probably not...
Its isnt hard but you need to change your programming standpoint. To write parallel code you need to think more about data alignment, dependency and flow. Its quite different than typical object/behaviour oriented programming.
I use ZIO (<a href="http://zio.dev" rel="nofollow">http://zio.dev</a>) for Scala which makes parallel programming trivial.<p>Wraps different styles of asynchronicity e.g. callbacks, futures, fibers into one coherent model. And has excellent resource management so you can be sure that when you are forking a task that it will always clean up after itself.<p>Have yet to see anything that comes close whilst still being practical i.e. you can leverage the very large ecosystem of Java libraries.
My solution is to only solve problems that are embarrassingly parallel, like graphics (one pixel = one thread) or physics simulations (one object = one thread), and escape the pain of synchronisation.