Why have CPUs been limited in frequency to around 3.5Ghz for so many years?

257 pointsby NARKOZover 13 years ago

16 comments

ChuckMcMover 13 years ago

The answer is 'simple' it comes in four parts;1) Transistors dissipate the most power when they are switching (this is because they transit their linear region). So the more they switch the more power they dissipate, thus the power dissipation of any transistor circuit is proportional to its switching frequency.2) Silicon stops working reliably above 150 degrees C, this is due to thermal effects in the silicon overwhelming its electrical characteristics. One way to think about it is that at 150 degrees C the silicon lattice structure is vibrating so hard because of heat the electrons can't move anymore.3) Lithography and manufacturing techniques have increased the number of transistors per square millimeter of silicon exponentially over the years so the number of transistors has been observed to double every 18 months or so.4) The ability to channel heat from silicon to the outside air is limited by the thermal conductivity of silicon and the ceramics used to encase it.So those four parameters create a 'box' which is sometimes called the 'design space.' If you build a chip that is inside the box it works, if one of the parameters goes outside the box it fails.In the great Mhz race of 1998 clock rates were pushed up which caused heat dissipation to go up, and transistor counts were going up too, so you got an n^2 effect in terms of heat dissipation. The race was powered by consumers who used the single number to compare machines (spec wars). It was unsustainable.The end of that war occurred when AMD introduced multi-core (two cpus for the price of one!) architectures. And Intel had the largest design failure in its history when it scrapped the entire Pentium 4 microarchitecture after realizing it would never get to, much less past 4Ghz as they had promised.AMD proved that for user visible throughput, multiple cores could give you a better net gain than a faster CPU. This side stepped the n-squared problem by having twice the number of transistors but running at the same clock rate, so the heat only went up linearly rather than exponentially.It was a pretty humbling moment for Intel.That begat the 'core wars' where Intel and AMD have worked to give us more and more 'cores'. The heat problem was still there, but it was managed with transistor design since the frequencies were staying flat.Recently, some new transistor designs and system micro-architectures, have combined with the inevitable flattening of performance gains from multiple cores (see Amdahl's law), to give us 'turbo-boost' type solutions, where only one core runs at higher speed (side stepping Amdahl) at the expense of down clocking or even turning off other cores (sidestepping the frequency component of power increases).Another technology on the horizon (which used to be only for the military guys) is SoD or Silicon-on-Diamond. Diamond is a wonderful conductor of heat and so if you make your processor on a diamond subtrate you can pull lots of heat out into an attached cooling system. Get ready for 7" x 7" heat management assemblies though that attach to these things. Or alternatively a new ATX type form factor that includes something that looks like a power supply but is a chiller that attaches via tubes to the processor socket.

评论 #3368161 未加载

评论 #3368499 未加载

评论 #3368019 未加载

评论 #3368132 未加载

评论 #3367939 未加载

评论 #3368654 未加载

jacquesmover 13 years ago

I'm probably in a very small minority but I think that the 'free ride' we got from Moore's law in terms of transistor density leading to an increase in clock frequency has caused us to be locked in to a situation that is very much comparable to the automobile industry and the internal combustion engine.If it weren't for that we'd have had to face the dragons of parallelization much earlier and we would have a programming tradition solidly founded on something other than single threaded execution.Our languages would have likely had parallel primitives and would be able to deal with software development and debugging in a multi-threaded environment in a more graceful way.Having the clock frequency of CPUs double on a fairly regular beat has allowed us to ignore these problems for a very long time and now that we do have to face them we'll have to unlearn a lot of what we take to be un-alterable.I'm not sure about much about the future, the one thing I do know is that it is parallel and if you're stuck on waiting for the clock frequency increases of tomorrow you'll be waiting for a very long time, and possibly forever.

评论 #3366359 未加载

评论 #3366603 未加载

评论 #3366332 未加载

评论 #3366841 未加载

评论 #3366327 未加载

xxcodeover 13 years ago

High(er) Frequency means that you have to run at a higher core voltage (this is, in part, because gate propagation delay is inversely proportional to bias voltage, i.e., the voltage corresponding to a 'high' or 1 bit). You have to decrease the propagation delay so that the clock tick gets everywhere in the processor quickly (one part of the clock doesn't lag the other parts, called clock skew). So in order for things to run faster, you'd have to run it at a higher V(bias). Now that means that there are higher thermal costs (things get hotter) - heat produced is proportion to voltage.So its now mostly a thermal management problem. This is the primary problem in the newer chips. Even though we can pack in more transistors, we can't get signals among them faster without higher V and making it run too hot.Therefore, our solution is to use the extra transistors to create a separate new processor, running with a different clock, so the 'tick' doesn't have to reach all parts of this rather big chip, but just needs to be synchronized intra-core.

评论 #3366457 未加载

评论 #3366356 未加载

microarchitectover 13 years ago

This is a good example of why science isn't a popularity contest. The top voted reply makes some vague noises about vt-scaling and leakage. It then claims that we "don't get a very good "off" if the threshold voltage is too low". This is incorrect. Leakage doesn't degrade the logic values for CMOS-style logic, which is the vast majority of the digital logic in the world.The real issue which the OP may or may have been trying to get at is that leakage power eats into the chip's power budget. Since we'd rather not burn our budget on leakage, we reduce leakage by increasing the threshold voltage. But unfortunately, this comes at the cost of frequency.The other very important issue is that of power-density. There's a famous graph that Shekar Borkar of Intel [1] made showing that if we continued to ignore power dissipation issues like in the past our chips would run ridiculously hot (IOW, they wouldn't work at all because they'd just burn themselves to death).There are also other issues like the fact that your wires start acting like antennas at 5+ GHz and that reliability concerns like electromigration and dielectric breakdown are getting worse with newer technologies.A much more recent and reliable (not to mention highly-cited) reference on scaling and power issues is [2].[1] I found a copy here: <a href="http://www.nanowerk.com/spotlight/id1762_1.jpg" rel="nofollow">http://www.nanowerk.com/spotlight/id1762_1.jpg</a> [2] <a href="http://www-vlsi.stanford.edu/papers/mh_iedm_05.pdf" rel="nofollow">http://www-vlsi.stanford.edu/papers/mh_iedm_05.pdf</a>

评论 #3367197 未加载

JoshTriplettover 13 years ago

We've gotten very close to physical limits. Quick back-of-the-envelope estimate: if electrons traveled at the speed of light through silicon (they don't), then in 3GHz, an electron could travel .1 meters. In reality, electrons in silicon travel quite a bit slower than that. Net result: electrons can barely cross the diameter of the chip in one cycle, even without gate propagation delays and other factors limiting work done per cycle.So, if you want data from a tiny distance away, such as a local register, you can grab it and do something simple with it in one cycle. If you need data from any further away, forget about it; cache takes longer, another core takes even longer, and memory takes far longer.

评论 #3366432 未加载

评论 #3366463 未加载

评论 #3366436 未加载

评论 #3366441 未加载

评论 #3366759 未加载

kashifrover 13 years ago

According to the Sandia's Cooler whiter-paper pdf: <a href="http://prod.sandia.gov/techlib/access-control.cgi/2010/100258.pdf" rel="nofollow">http://prod.sandia.gov/techlib/access-control.cgi/2010/10025...</a> the limit is due to a "Thermal Brick Wall". Basically due to a lack of progress in heat exchanger technology.I have visualized this "Brick-Wall" as a graph with data from Wikipedia pdf: <a href="http://dl.dropbox.com/u/3215373/Thermal-Brick-Wall.pdf" rel="nofollow">http://dl.dropbox.com/u/3215373/Thermal-Brick-Wall.pdf</a>

iradikover 13 years ago

Donald Knuth on multicore (2008):Andrew: Vendors of multicore processors have expressed frustration at the difficulty of moving developers to this model. As a former professor, what thoughts do you have on this transition and how to make it happen? Is it a question of proper tools, such as better native support for concurrency in languages, or of execution frameworks? Or are there other solutions?Donald: I don’t want to duck your question entirely. I might as well flame a bit about my personal unhappiness with the current trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won’t be surprised at all if the whole multithreading idea turns out to be a flop, worse than the "Itanium" approach that was supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write.Let me put it this way: During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. Surely, for example, multiple processors are no help to TeX.[1]How many programmers do you know who are enthusiastic about these promised machines of the future? I hear almost nothing but grief from software people, although the hardware folks in our department assure me that I’m wrong.I know that important applications for parallelism exist—rendering graphics, breaking codes, scanning images, simulating physical and biological processes, etc. But all these applications require dedicated code and special-purpose techniques, which will need to be changed substantially every few years.Even if I knew enough about such methods to write about them in TAOCP, my time would be largely wasted, because soon there would be little reason for anybody to read those parts. (Similarly, when I prepare the third edition of Volume 3 I plan to rip out much of the material about how to sort on magnetic tapes. That stuff was once one of the hottest topics in the whole software field, but now it largely wastes paper when the book is printed.)The machine I use today has dual processors. I get to use them both only when I’m running two independent jobs at the same time; that’s nice, but it happens only a few minutes every week. If I had four processors, or eight, or more, I still wouldn’t be any better off, considering the kind of work I do—even though I’m using my computer almost every day during most of the day. So why should I be so happy about the future that hardware vendors promise? They think a magic bullet will come along to make multicores speed up my kind of work; I think it’s a pipe dream. (No—that’s the wrong metaphor! "Pipelines" actually work for me, but threads don’t. Maybe the word I want is "bubble.")From the opposite point of view, I do grant that web browsing probably will get better with multicores. I’ve been talking about my technical work, however, not recreation. I also admit that I haven’t got many bright ideas about what I wish hardware designers would provide instead of multicores, now that they’ve begun to hit a wall with respect to sequential computation. (But my MMIX design contains several ideas that would substantially improve the current performance of the kinds of programs that concern me most—at the cost of incompatibility with legacy x86 programs.)Source: <a href="http://www.informit.com/articles/article.aspx?p=1193856" rel="nofollow">http://www.informit.com/articles/article.aspx?p=1193856</a>

评论 #3366422 未加载

评论 #3366399 未加载

评论 #3366538 未加载

评论 #3366562 未加载

DiabloD3over 13 years ago

I wish people would quit asking this question. This is just another case of the Mhz myth. Who cares what the clock speed is if modern CPUs can execute instructions 2-4x faster per clock than they did 30 years ago.Go look at the Bulldozer design, 2 hardware decoder/scheduler engines, 4 integer ALUs, 2 fp ALUs, all per core...A shared L3 cache per socket that is owned by the memory controller and is socket-local to other sockets (ie, all memory controllers conspire to cache system memory efficiently and synchronously know what is cached across all sockets)...And the memory controllers also accept memory requests from ANY core on the Hypertransport bus, no matter which socket, and multi-socket boards commonly have one memory bank per socket, thus 4 sockets of dual channel DDR3-1600 would indeed give 820 gbit/sec of bandwidth that can be accessed (almost) in full by any individual core[1]...The ALUs have execution queues, and any thread (currently 2 per core) can schedule instructions on it to maximize ALU packing...And you can now buy Bulldozers for Socket G34 that have 16 threads/8 cores per socket, and G34 boards usually have 4 sockets.So again, who cares what the mhz is?[1]: A 4 socket setup has 4 or 6 Hypertransport links, on AM3+ and G34+ sockets this would be HTX 3.1 16bit wide links running at 3.2ghz, or 204 gbit/sec per link.On a 4 socket ring, that would be 204 gbit/sec off each neighbor's memory bank, plus another 204 gbit/sec (also the speed of dual channel DDR3-1600) from the local memory bank, thus leading to 612 gbit/sec that could be theoretically saturated by a single core.On a 4 socket full crossbar, it would be the full 820 gbit/sec.

评论 #3366343 未加载

评论 #3367650 未加载

glimcatover 13 years ago

Because scaling limits. The stuff we're already getting uses very highly doped silicon. You could go smaller or faster if you could dope it more to keep the field characteristics viable, but more doping would screw up the silicon lattice. "More" also generally means an orders-of-magnitude increase.There are a few alternative processes and materials, but they're costly as all hell and hard to do in bulk.

Symmetryover 13 years ago

The main problem is that leakage current has started to become a problem[1]. Back in the day designers could just scale down features and rely on the reduced capacitance of the smaller areas to lower power usage enough to let them put in more logic. Unfortunately that only reduces active power, not leakage power. Transistors have started leaking more now that they're smaller. You can reduce leakage by lowering the voltage your processor operates at, but that also causes a reduction in the frequency that your transistors flip at because your logic voltage becomes smaller with respect to the transistor threshold voltage, meaning less current but unit of charge you have to move. Modern devices are also tending to run up against saturation velocity[2] now, limiting their switching speed still further.You certainly could increase clock speed by increasing the voltage you put into a chip, and just accepting that you're going to have more leakage current and more wasted power. But we're already close to the edge of what chips can dissipate right now. You can try having less logic between clock latches, meaning you have a higher clock speed for your switching delay. However, this increases the ratio of latches to everything else so its a matter of diminishing returns. It also means that you've pushed your useful logic further apart, and now you have more line capacitance too. Finally, you can decrease the temperature of your silicon to substantially reduce the amount of leakage you get. This will let you raise the voltage safely, letting you attain faster switching speeds. The only problem is that this requires expensive cooling devices.[1]<a href="http://en.wikipedia.org/wiki/Leakage_%28semiconductors%29" rel="nofollow">http://en.wikipedia.org/wiki/Leakage_%28semiconductors%29</a> [2]<a href="http://en.wikipedia.org/wiki/Saturation_velocity" rel="nofollow">http://en.wikipedia.org/wiki/Saturation_velocity</a>

tibbonover 13 years ago

So what confuses me is that we've seen tech demos (and overclockers) push CPU speed to 4 to 10ghz. Are those gains so artificially made that they just can't be replicated at scale in the public market?We can get them to go faster, just seemingly not for the public.It was weird to buy my new Macbook Pro and after 3+ years, it was .2ghz slower than my old one. Now of course it has more cores, better instructions, etc... but it was still a weird thing.Apple's done a great job of explaining the benefits of new ones to the consumers - they don't. I'm not being sarcastic. At the end of the 90's (and still largely today for most PC companies) its all about specs. Apple sold what you can do with each computer instead in a good, better, best format. I bet a large amount of computers sold at the Apple store never have the sales associate mention the clock speed.

评论 #3367057 未加载

评论 #3366916 未加载

jeremysalwenover 13 years ago

>What would happen if you added a second compressor loop with its cold side on the hot side of the first one? What if you added a third one? You've now got 3 stage cascade phase change(compressors cool stuff by compressing gas into a liquid, then letting it suck up thermal energy while decompressing/evaporating back into a gas elsewhere, i.e. phase change) cooling on one end, and a spectacularly inefficient heater on the other.Now wait a second... that's not just wrong, that's precisely the opposite of reality. If you have a heat pump, it must be more efficient than a simple heating element. If it at all cools the processor, by the simple laws of thermodynamics, it must heat the room more than any process which simply converts the work directly to room heat.

foolinatorover 13 years ago

CPU speed isn't a bottleneck anymore. These days improving caching, threading, and increasing the bus/memory speed have been the primary contributors to speeding up a computer. Once those bottlenecks close a bit you'll see those numbers begin to rise again.

redthrowawayover 13 years ago

I'm regularly impressed with the mods over at /r/asks Jen e and how high they've been able to keep the SNR.

评论 #3368056 未加载

its_so_onover 13 years ago

this is why:<a href="http://www.google.com/search?q=c+%2F+3.5+Ghz" rel="nofollow">http://www.google.com/search?q=c+%2F+3.5+Ghz</a>that's a theoretical maximum. you increase Ghz, you have a shorter theoretical maximum length of path electricity can take through your chip. An i7 is not that small. You would have to shrink things further and further to get a smaller chip with shorter paths in it, which is what new fabrication methods have been about.Obviously this is very difficult. Sure, chips could be even faster if they were just a few atoms across, but who would expect you to do meaningful computation in that size, or to be able to manufacture that.

评论 #3368565 未加载

karolistover 13 years ago

No idea. I have my E8400 still kicking ass at 4.2Ghz on air.

评论 #3367593 未加载