Many Core processors: Everything You Know (about Parallel Programming) Is Wrong

180 pointsby miha123over 13 years ago

17 comments

rauljaraover 13 years ago

"The obstacle we shall have to overcome, if we are to successfully program manycore systems, is our cherished assumption that we write programs that always get the exactly right answers."Most of the time, this is not a trade off worth making. I can't think of a researcher that would willingly trade replicability for speed. I can't think of a mathematician who would base a proof on the idea that a number is probably prime. I can't think of a bank customer who would be fine with the idea that the balance displayed on the ATM is pretty close to where it should be. I can't think of an airline passenger who would be totally fine with the flight computer usually being pretty good.It would be a fine trade off for games, however. And I'm sure there is room for some fudging in complex simulations that have plenty of randomness already.But given the choice between an answer that is correct, and an answer that is probably correct, I will take the correct answer. Even if I have to wait a little.

评论 #3472985 未加载

评论 #3474387 未加载

评论 #3472383 未加载

评论 #3472393 未加载

评论 #3473272 未加载

评论 #3472388 未加载

评论 #3474748 未加载

评论 #3473277 未加载

评论 #3473435 未加载

wrsover 13 years ago

For those making off-the-cuff judgments of how crazy this idea is: In 1990 or so, Dave Ungar told me he was going to make his crazy Self language work at practical speed by using the crazy idea of running the compiler on every method call at runtime. Then he and his crazy students based the Hotspot Java compiler on that crazy idea, which is now the industry-standard way of implementing dynamic languages. So now I tend to pay close attention to Dave's crazy ideas...

jedbrownover 13 years ago

"Just as we learned to embrace languages without static type checking, and with the ability to shoot ourselves in the foot, we will need to embrace a style of programming without any synchronization whatsoever."This is dangerous misinformation that is also being propagated by some managers of the "exascale" programs that seem to have lost sight of the underlying science. Some synchronization is algorithmically necessary for pretty much any useful application. The key is to find methods in which synchronization is distributed, with short critical paths (usually logarithmic in the problem size, with good constants).

评论 #3472473 未加载

评论 #3473090 未加载

ChuckMcMover 13 years ago

"every (non-hand-held) computer’s CPU chip will contain 1,000 fairly homogeneous cores."There are two problems with these visions one is memory and the other is the interconnect. 1000 cores, even at a modest clock rate, can easily demand 1 Terabyte of memory accesses per second. But memory has the same economies as 'cores' in that it's more cost effective when it is in fewer chips. But the chip is limited in how fast it can send signals over its pins to neighboring chips (see Intel's work on Light bridge).So you end up with what are currently exotic chip on chip types of deals, or little Stonehenge like motherboards where this smoking hot chip is surrounded by a field of RAM shooting lasers at it.The problem with that vision is that to date, the 'gains' we've been seeing have been when the chips got better but the assembly and manufacturing processes stayed more or less the same.So when processors got better the existing manufacturing processes were just re-used.That doesn't mean that at some point in the future we might have 1000 core machines, it just means that other stuff will change first (like packaging) before we get them. And if you are familiar with the previous 'everything will be VLIW (very large instruction world)' prediction you will recognize that a lack of those changes sometimes derail the progress. (in the VLIW case there have been huge compiler issues)The interconnect issue is that 1000 cores can not only consume terabytes of memory bandwidth they can generate 10s of gigabytes of data to and from the compute center. That data, if it is being ferried to the network on non-volatile storage needs channels that run at those rates. Given that the number of 10GbE ports on 'common computers' is still quite small, another barrier to this vision coming to pass is that these machines will be starved for bandwidth to get to fresh data to work on, or to put out data they have digested or transformed.

评论 #3473341 未加载

评论 #3472589 未加载

评论 #3474367 未加载

pnathanover 13 years ago

<a href="http://en.wikipedia.org/wiki/Connection_Machine" rel="nofollow">http://en.wikipedia.org/wiki/Connection_Machine</a>Money quote: "The CM-1, depending on the configuration, had as many as 65,536 processors"I would suggest that when someone wants to get excited about exascale computing, they review the Connection Machine literature. Manycore is not a radically new concept.

评论 #3473539 未加载

评论 #3473381 未加载

morphleover 13 years ago

At our startup we are creating our own many core processor SiliconSqueak and VM along the lines of David Ungars work. Writing non-deterministic software is fun, you just need to radically change your perspective on how to program. For Lisp and Smalltalk programmers this outlook change is easy to do. We welcome coders who want to learn about it.

评论 #3472402 未加载

评论 #3472233 未加载

评论 #3472992 未加载

eternalbanover 13 years ago

Haven't looked at the project yet, but some thoughts based on OP:"Even lock-free algorithms will not be parallel enough. They rely on instructions that require communication and synchronization between cores’ caches."Azul's Vega 3 with 864 cores/640GB mem (2008) with Azul JVM apparently works fine using lock-free java.util.concurrnt.* classes & would appear to be a counter point to the very premise of the OP.It is also probably more likely we will see new drastic rethink of memory managers and cooperation between h/w and s/w designers (kernel /compiler level). Right now, everything is sitting on top of malloc() and fencing instructions. What is more painful? Write non-deterministic algorithms or bite the bullet and update h/w and kernels and compilers? See Doug Lea's talk at ScalaDays 2011 ([1] @66:30)And this is not to mention anything about FP and STM approach to the same issue.[1]: <a href="https://wiki.scala-lang.org/display/SW/ScalaDays+2011+Resources#ScalaDays2011Resources-KeynoteDougLea-SupportingtheManyFlavorsofParallelProgramming" rel="nofollow">https://wiki.scala-lang.org/display/SW/ScalaDays+2011+Resour...</a>

评论 #3474398 未加载

BadassFractalover 13 years ago

A talk by David Ungar on this very subject is available at CMU-SV Talks on Computing Systems website: <a href="http://www.cmu.edu/silicon-valley/news-events/seminars/2011/ungar-talk.html" rel="nofollow">http://www.cmu.edu/silicon-valley/news-events/seminars/2011/...</a>

andrewcookeover 13 years ago

something of a side issue, but when was there a trade-off between static checking and performance? fortran and c have pretty much always been the fastest languages around, haven't they? is he referring to assembler?

评论 #3472284 未加载

评论 #3472289 未加载

VilleSalonenover 13 years ago

These edgy attention-grabbing titles are getting a bit too strong in my opinion.

6renover 13 years ago

Well, no real progress has been made in parallel programming in the decades of research (apart from the embarrassingly parallelizable), so we're probably going to have to give up something in our concept of the problem. But I really like determinism. If the proposal works out, future computer geeks will have a very different cognitive style.Another approach might be to recast every problem as finding a solution in a search-space - and then have as many cores as you like trying out solutions. Ideally, a search-space enables some hill-climbing (i.e. if you hit on a good solution, there's a greater than average probability that other good solutions will be nearby), and for this, it is very helpful to know the result of previous searches and thus sequential computation is ideal. But, if the hills aren't that great as predictors, and if you do feed-in other results as they become available, the many-cores would easily overcome this inefficiency.An appealing thing about a search-space approach is that it may lend itself to mathematically described problems, i.e. declare the qualities that a solution must have, rather than how to compute it.

wglbover 13 years ago

Ok, let me understand how this is going to work. If I want to deposit 13 cents to my bank, and this transaction is mixed in with a blast of other parallelized transactions, sometimes the right answer gets there, and other times i get only 12?Somehow, I don't think that is going to fly.Additionally, the statement about type checking and program correctness is not really correct.Let's try another thought experiment. Let's compile a linux kernel with this beast. We should be happy with sometimes getting the right answers? I am not sure that they have thought this through.Does anyone remember in the early days of MySQL where it was really really really really fast because it didn't have locks. Some wiser heads said "but it is often giving the wrong answer!" The reply was "well it is really really really fast!" And we know how that came out.Perhaps the expected output of this sea of devices is poetry, which in the minds of those on the project, might require less precision. But even there, some poetry does require lots of precision.

评论 #3473455 未加载

Peakerover 13 years ago

"Just as we learned to embrace languages without static type checking, and with the ability to shoot ourselves in the foot"I've moved on from dynamic languages to "static typing that doesn't suck" (Haskell).

jxcoleover 13 years ago

I think that this technology will eventually replace the current GPU processing that people have been doing. It has all sorts of cool crazy applications, but people will probably still need their good old fashioned deterministic CPU as an option.

评论 #3473105 未加载

d0mineover 13 years ago

Technology might allow to produce 1000-core computers but does Market need it?Will they be common enough? Or like with dirigibles other solutions will dominate.

评论 #3473317 未加载

tezzaover 13 years ago

This form of exploratory computing has existed for a while in CPU instruction scheduling. Branch Prediction etc. is widely used.The big trade off has to be power consumption.If you diminish accuracy, fine. But if your handset dies because some genetic algorithm didn't converge in 3 minutes, that'll be a problem.

tbrownawover 13 years ago

My email client is pretty much I/O-bound.My word processor is perfectly well able to keep up with my typing speed.My web browser is largely I/O-bound, except on pages that do stupid things with JavaScript.There is no reason to try to rewrite any of these to use funny algorithms that can spread work over tons of cores. They generally don't provide enough work to even keep one core busy, the only concern is UI latency (mostly during blocking I/O).Compiling things can take a while, but that's already easily parallelized by file.I'm told image/video processing can be slow, but there are already existing algorithms that work on huge numbers of cores (or on GPUs).Recalcing very large spreadsheets can be slow, but that should be rather trivial to parallelize (along the same lines as image processing)....So isn't the article pretty much garbage?

评论 #3472377 未加载

评论 #3472375 未加载

评论 #3472322 未加载

评论 #3472257 未加载

评论 #3472422 未加载

评论 #3472335 未加载