Uh, you can't just drop in 'random_r' in place of 'random'. If you do that, what will happen is that you'll get <i>the same sequence of random numbers in every thread</i>, making the whole multithreading exercise useless!<p>You have to be sure to seed the generator differently in each thread, before calling 'random_r'.
It should be kept in mind that standard memory (e.g. DDR3) doesn't actually support concurrent accesses (and at least on x86, one or two cores at most can easily saturate the full bandwidth), so anytime you have shared data accessed by multiple threads it's actually serialised and that part does not perform any better than the single-threaded case; the biggest speedup occurs when the majority of the time the multiple threads are running in their own caches and not sharing any data.
I have some experience in this domain and I agree with the author: threads are not a performance cure-all. Many naïve implementations will actually hurt performance[1]. It takes careful thought, experimentation, measurement, and quite a bit of elbow grease to get the expected speedup.<p>The more general lesson is this: Just because you have very good reasons to believe a change will improve performance, that doesn't mean it will actually improve performance! Benchmark. Profile. Fix the bottleneck. Repeat. That's the only reliable way to make something faster.<p>1. <a href="http://geoff.greer.fm/2012/09/07/the-silver-searcher-adding-pthreads/" rel="nofollow">http://geoff.greer.fm/2012/09/07/the-silver-searcher-adding-...</a>
<i>"It's little more than a tight loop around a pseudo-random number generator."</i> One locked against concurrent access.<p>Yes, put a lock and context switch in a tight inner loop, and your performance will suck.
The author writes that this is a contrived case but I was doing Monte Carlo simulations 10 years ago, i.e. I wanted <i>exactly</i> this case.<p>In the end, I just let them run on one thread and could never work out how they could be 10x slower with threading - until today. Thanks.
This exact problem had me worried when I heard about OpenBSD's arc4random, and how they've been promoting a pervasive use of it. I haven't taken the time to look at how they solve the problem of contention, while still maximizing the unpredictability of the random number generator's state by using it.
The author's multithreaded random_r version has the benefit of performance, but has the problem of, well, brokenness. Without locking the internal state of random_r(), it will be corrupted, and the results won't be meaningful.
what about using OpenMP?<p>My recollection is that I wrote a similar program (statistical bootstrapping, essentially a loop around random() ) and using OpenMP on a 4-core machine definitely produced a speedup (not x4 but close)<p>Does OpenMP somehow sidestep this issue of shared access? (or is my memory wrong)
I don't really see the point of using multithreading for speed up if you have less than 10 cores. that program example might run faster with opencl.