TechEcho

9 comments

ScottBursonover 10 years ago

Uh, you can't just drop in 'random_r' in place of 'random'. If you do that, what will happen is that you'll get the same sequence of random numbers in every thread, making the whole multithreading exercise useless!You have to be sure to seed the generator differently in each thread, before calling 'random_r'.

评论 #8711568 未加载

评论 #8712443 未加载

评论 #8711689 未加载

userbinatorover 10 years ago

It should be kept in mind that standard memory (e.g. DDR3) doesn't actually support concurrent accesses (and at least on x86, one or two cores at most can easily saturate the full bandwidth), so anytime you have shared data accessed by multiple threads it's actually serialised and that part does not perform any better than the single-threaded case; the biggest speedup occurs when the majority of the time the multiple threads are running in their own caches and not sharing any data.

评论 #8711831 未加载

评论 #8711836 未加载

评论 #8711553 未加载

评论 #8711550 未加载

ggreerover 10 years ago

I have some experience in this domain and I agree with the author: threads are not a performance cure-all. Many naïve implementations will actually hurt performance[1]. It takes careful thought, experimentation, measurement, and quite a bit of elbow grease to get the expected speedup.The more general lesson is this: Just because you have very good reasons to believe a change will improve performance, that doesn't mean it will actually improve performance! Benchmark. Profile. Fix the bottleneck. Repeat. That's the only reliable way to make something faster.1. <a href="http://geoff.greer.fm/2012/09/07/the-silver-searcher-adding-pthreads/" rel="nofollow">http://geoff.greer.fm/2012/09/07/the-silver-searcher-adding-...</a>

评论 #8712404 未加载

Animatsover 10 years ago

"It's little more than a tight loop around a pseudo-random number generator." One locked against concurrent access.Yes, put a lock and context switch in a tight inner loop, and your performance will suck.

评论 #8711386 未加载

评论 #8711559 未加载

learnstats2over 10 years ago

The author writes that this is a contrived case but I was doing Monte Carlo simulations 10 years ago, i.e. I wanted exactly this case.In the end, I just let them run on one thread and could never work out how they could be 10x slower with threading - until today. Thanks.

pjbringerover 10 years ago

This exact problem had me worried when I heard about OpenBSD's arc4random, and how they've been promoting a pervasive use of it. I haven't taken the time to look at how they solve the problem of contention, while still maximizing the unpredictability of the random number generator's state by using it.

评论 #8712181 未加载

评论 #8711922 未加载

fcheover 10 years ago

The author's multithreaded random_r version has the benefit of performance, but has the problem of, well, brokenness. Without locking the internal state of random_r(), it will be corrupted, and the results won't be meaningful.

评论 #8712397 未加载

评论 #8712429 未加载

plgover 10 years ago

what about using OpenMP?My recollection is that I wrote a similar program (statistical bootstrapping, essentially a loop around random() ) and using OpenMP on a 4-core machine definitely produced a speedup (not x4 but close)Does OpenMP somehow sidestep this issue of shared access? (or is my memory wrong)

评论 #8714537 未加载

jokoonover 10 years ago

I don't really see the point of using multithreading for speed up if you have less than 10 cores. that program example might run faster with opencl.

9 comments

ScottBursonover 10 years ago

评论 #8711568 未加载

评论 #8712443 未加载

评论 #8711689 未加载

userbinatorover 10 years ago

评论 #8711831 未加载

评论 #8711836 未加载

评论 #8711553 未加载

评论 #8711550 未加载

ggreerover 10 years ago

评论 #8712404 未加载

Animatsover 10 years ago

评论 #8711386 未加载

评论 #8711559 未加载

learnstats2over 10 years ago

pjbringerover 10 years ago

评论 #8712181 未加载

评论 #8711922 未加载

fcheover 10 years ago

评论 #8712397 未加载

评论 #8712429 未加载

plgover 10 years ago

评论 #8714537 未加载

jokoonover 10 years ago

I don't really see the point of using multithreading for speed up if you have less than 10 cores. that program example might run faster with opencl.

Make your program slower with threads

9 comments

Make your program slower with threads

9 comments