TechEcho

replicantabout 11 years ago

The first 2 examples are not dividing the work between the threads, but having each of the threads repeat the full work, which is not poor OpenMP use, but wrong use. I would have also used the collapse directive and played a little bit with the schedule. Finally, looping in the inner loop through the first index is not a good idea not only when working with OpenMP.

deletesabout 11 years ago

Comment on the blog that got deleted:I did a similar test in C and have gotten very similar results. When N is around 4000 the trashing version starts to differ substantially. A 3x difference can already be seen when N is 1000.This means if your program is running on two threads over different parts of the matrix, every single iteration requires a request to RAM.I'm skeptical over this part, I have tried to replicate this behavior but was unsuccessful. Even though cores are sharing L3, I doubt that a thread will overwrite the entire cache on every iteration.

评论 #7496118 未加载

pronabout 11 years ago

For all those interested in this subject, I'd like to recommend Nitsan Wakart's blog, <a href="http://psy-lob-saw.blogspot.com/" rel="nofollow">http://psy-lob-saw.blogspot.com/</a>, which is dedicated to mechanical sympathy relating to concurrency and the memory system.

评论 #7495922 未加载

The Cache and Multithreading

3 comments

The Cache and Multithreading

3 comments