Single-chip processors have reached their limits

203 pointsby blopeurabout 3 years ago

24 comments

bob1029about 3 years ago

Despite the limitations apparently present in single chip/CPU systems, they can still provide an insane amount of performance if used properly.There are also many problems that are literally impossible to make faster or more correct than by simply running them on a single thread/processor/core/etc. There always will be forever and ever. This is not a "we lack the innovation" problem. It's an information-theoretic / causality problem you can demonstrate with actual math & physics. Does a future event's processing circumstances maybe depend on all events received up until now? If yes, congratulations. You now have a total ordering problem just like pretty much everyone else. Yes, you can cheat and say "well these pieces here and here dont have a hard dependency on each other", but its incredibly hard to get this shit right if you decide to go down that path.The most fundamental demon present in any distributed system is latency. The difference between L1 and a network hop in the same datacenter can add up very quickly.Again, for many classes of problems, there is simply no handwaving this away. You either wait the requisite # of microseconds for the synchronous ack to come back, or you hope your business doesnt care if john doe gets duplicated a few times in the database on a totally random basis.

评论 #30915214 未加载

评论 #30912633 未加载

评论 #30915065 未加载

评论 #30915251 未加载

评论 #30924576 未加载

retracabout 3 years ago

The best chiplet interconnect may turn out to be no interconnect at all. Wafer scale integration [1] has come up periodically over the years. In short, just make a physically larger integrated circuit, potentially as large as the entire wafer -- like a foot across. As I understand it, there's no particular technical hurdle, and indeed the progress with self-healing and self-testing designs with redundancy to improve yield for small processors, also makes really large designs more feasible than in the past. The economics never worked out in the favour of this approach before, but now we're at the scaling limit maybe that will change.At least one company is pursuing this at the very high end. The Cerebras WFE-2 [2] ("wafer scale engine") has 2.6 trillion transistors with 800,000 cores and 48 gigabytes of RAM, on a single, giant, integrated circuit (shown in the linked article). I'm just an interested follower of the field, no expert, so what do I know. But I think that we may see a shift in that direction eventually. Everything on-die with a really big die. System on a chip, but for the high end, not just tiny microcontrollers.[1] <a href="https://en.wikipedia.org/wiki/Wafer-scale_integration" rel="nofollow">https://en.wikipedia.org/wiki/Wafer-scale_integration</a>[2] <a href="https://www.zdnet.com/article/cerebras-continues-absolute-domination-of-high-end-compute-it-says-with-worlds-hugest-chip-two-dot-oh/" rel="nofollow">https://www.zdnet.com/article/cerebras-continues-absolute-do...</a>

评论 #30909957 未加载

评论 #30911396 未加载

评论 #30914869 未加载

评论 #30910413 未加载

评论 #30911019 未加载

评论 #30914182 未加载

评论 #30911322 未加载

评论 #30927773 未加载

评论 #30912243 未加载

评论 #30911695 未加载

评论 #30909840 未加载

WalterBrightabout 3 years ago

I remember back in the 80's the limit was considered to be 64K RAM chips, because otherwise the defect rate would kill the yield.Of course, there's always the "make a 4 core chip. If one core doesn't work, sell it as a 3 core chip. And so on."

评论 #30911177 未加载

tempnow987about 3 years ago

"Reached their limits" - I feel like I've heard this many many times before.Not that I doubt it, but just I've also been impressed with the ingenuity that folks come up with in this space.

评论 #30916276 未加载

评论 #30909922 未加载

评论 #30910775 未加载

评论 #30911641 未加载

评论 #30911362 未加载

评论 #30910905 未加载

评论 #30910042 未加载

Veliladonabout 3 years ago

The M1 Ultra is fabricated as a single chip. The 12900K is fabricated as a single chip and is still a quarter the size of the M1 Ultra. Ryzen 3 puts 8 cores on a CCX instead of four because DDR memory controllers don't have infinite memory bandwidth (contrary to AMD's wishful nomenclature) and make shitty interconnects between banks of L3.Chiplets are valid strategies that are going to be used in the future but there are still more tricks that CPU makers have up their sleeves that they need to use out of necessity. They're nowhere near their limits.

评论 #30910850 未加载

评论 #30910013 未加载

评论 #30913072 未加载

fulafelabout 3 years ago

Some older stuff for reference: IBM POWER5 and POWER5+ (2004&2005) are MCM designs, had 2-4 CPU chips plus cache chips in same package.Link: <a href="https://en.wikipedia.org/wiki/POWER5" rel="nofollow">https://en.wikipedia.org/wiki/POWER5</a>

评论 #30911347 未加载

RcouF1uZ4gsCabout 3 years ago

> UCIe is a start, but the standard’s future remains to be seen. “The founding members of initial UCIe promoters represent an impressive list of contributors across a broad range of technology design and manufacturing areas, including the HPC ecosystem,” said Nossokoff, “but a number of major organizations have not as yet joined, including Apple, AWS, Broadcom, IBM, NVIDIA, other silicon foundries, and memory vendors.”The fact that the standard doesn’t include anyone who is actually building chips makes me very pessimistic about it.

评论 #30909950 未加载

ksecabout 3 years ago

Is spectrum.ieee.org becoming another mainstream ( so to speak ) journalism where everything is dumbed down to basically Newspeak. The article is poorly written, the content is shallow and the headline is click bait.

评论 #30915992 未加载

refulgentisabout 3 years ago

I'm embarrassed to admit I still don't quite understand what a chiplet is, would be very grateful for your input here.If a thread can run on multiple chiplets then this is awesome and seems like a solution.If one thread == one chiplet, then*:- a chiplet is equivalent to a core, except with speedier connections to other cores?- this isn't a solution, we're 15 years into cores and single-threaded performance is still king. If separating work into separate threads was a solution, cores would work more or less just fine.*** put "in my totally uneducated opinion, it seems like..." before each of these, internet doesn't communicate tone well and I'm definitely not trying to pass judgement here, I don't know what I'm talking about!** generally, for consumer hardware and use cases, i.e. "I am buying a new laptop and I want it to go brrrr", all sorts of caveats there of course

评论 #30911427 未加载

评论 #30910709 未加载

评论 #30910876 未加载

gnarbarianabout 3 years ago

Are we moving this way because bigger chips with many cores have worse yields? so the answer is to make lots of little chips and fuse then together?

评论 #30909873 未加载

评论 #30910949 未加载

评论 #30911367 未加载

评论 #30909783 未加载

naragabout 3 years ago

I hope somebody with relevant knowledge can answer this question, please: what % of the costs is "physical cost per unit" and what % is maintaining the I+D, factories, channels...?In other words, if a chip with 100x size (100x gates, etc.) made sense, would it cost 100x to produce or just 10x or just 2x?Edit: providing there wouldn't be additional design costs, just stacking current tech.

评论 #30911548 未加载

评论 #30911678 未加载

评论 #30911639 未加载

评论 #30912013 未加载

评论 #30911562 未加载

AnimalMuppetabout 3 years ago

"More multi-chip processor designs" != "single-chip processors have reached their limits".

dahfizzabout 3 years ago

Maybe we will start to optimize our software instead of expecting accelerating CPUs to eat our bloat.

Animatsabout 3 years ago

Is this a solution to a yield problem? Making physically bigger dies is no problem. Wafers are much larger than the individual dies. If the dies are just being laid out flat, there's no density gain.Multi-chip modules are nothing new. They've been used mostly when either there was a yield problem, or you wanted two different fab technologies. The latter is seen in some imagers and radars.

pdimitarabout 3 years ago

What we need much more from here on are deterministic processors. There are awesome optimizers out there, including programs with genetic algorithms that find the best way to do a micro task X or Y (part of a much bigger program).IMO we have a ton of slightly-higher-hanging fruit that we can pick in terms of optimization but the relentless march of the X86 / X64 architecture obstructed that innovation.Might be time to look inwards and start working more on squeezing the CPUs we have right now for maximum performance.

hinkleyabout 3 years ago

I hope we are going to get back to a more asymmetric multi-processing arrangement in the near term where we abandon the fiction of a processor or two running the whole show with peripheral systems that have as little smarts as possible and promote them to at least second class citizens.These systems are much more powerful than when these abstractions were laid down, and at this point it feels like the difference between redundant storage on the box versus three feet away is more academic than anything else.

评论 #30912230 未加载

评论 #30928526 未加载

评论 #30911374 未加载

评论 #30914051 未加载

marcodiegoabout 3 years ago

Makes me remember the processor in the film terminator 2: <a href="https://gndn.files.wordpress.com/2016/04/shot00332.jpg" rel="nofollow">https://gndn.files.wordpress.com/2016/04/shot00332.jpg</a>

sargstuffabout 3 years ago

Not if C20 spec has anything to do with it. Program as OS, assign case selections in switch statement to threads (or distributed to other 'single-chip' cpu's. boundless networked single ASIC chip possibilities!

kzrdudeabout 3 years ago

It's surprising it took that many cores before the limit was reached!

ladyattisabout 3 years ago

I wonder if this will finally restart any research into non-Von Neumann architecture even for commercial uses like workstations and servers.

anonymousDanabout 3 years ago

So is UCI-e a competitor/potential successor for something like Intel's QPI (or whatever they are using now)?

truth_seekerabout 3 years ago

A chip with Semi-FPGA as well as Semi-ASIC strategy could work. FPGA dev tools chain needs to improve.

throwaway4goodabout 3 years ago

"Single-Chip Processors Have Reached Their LimitsAnnouncements from XYZ and ABC prove that chiplets are the future, but interconnects remain a battleground"This could easily have been written 10 years ago, and I bet someone will write it in 10 years again.We need these really big chips with their big powerful cores because the nature the computing we do only changes very slowly towards being distributed and parallelizable and thus able to use a massive number of smaller but far more efficient cores.

评论 #30911655 未加载

评论 #30910502 未加载

sargstuffabout 3 years ago

Theoretically, only a problem if certfied turing complete.

24 comments

bob1029about 3 years ago

评论 #30915214 未加载

评论 #30912633 未加载

评论 #30915065 未加载

评论 #30915251 未加载

评论 #30924576 未加载

retracabout 3 years ago

评论 #30909957 未加载

评论 #30911396 未加载

评论 #30914869 未加载

评论 #30910413 未加载

评论 #30911019 未加载

评论 #30914182 未加载

评论 #30911322 未加载

评论 #30927773 未加载

评论 #30912243 未加载

评论 #30911695 未加载

评论 #30909840 未加载

WalterBrightabout 3 years ago

评论 #30911177 未加载

tempnow987about 3 years ago

"Reached their limits" - I feel like I've heard this many many times before.Not that I doubt it, but just I've also been impressed with the ingenuity that folks come up with in this space.

评论 #30916276 未加载

评论 #30909922 未加载

评论 #30910775 未加载

评论 #30911641 未加载

评论 #30911362 未加载

评论 #30910905 未加载

评论 #30910042 未加载

Veliladonabout 3 years ago

评论 #30910850 未加载

评论 #30910013 未加载

评论 #30913072 未加载

fulafelabout 3 years ago

评论 #30911347 未加载

RcouF1uZ4gsCabout 3 years ago

评论 #30909950 未加载

ksecabout 3 years ago

评论 #30915992 未加载

refulgentisabout 3 years ago

评论 #30911427 未加载

评论 #30910709 未加载

评论 #30910876 未加载

gnarbarianabout 3 years ago

Are we moving this way because bigger chips with many cores have worse yields? so the answer is to make lots of little chips and fuse then together?

评论 #30909873 未加载

评论 #30910949 未加载

评论 #30911367 未加载

评论 #30909783 未加载

naragabout 3 years ago

评论 #30911548 未加载

评论 #30911678 未加载

评论 #30911639 未加载

评论 #30912013 未加载

评论 #30911562 未加载

AnimalMuppetabout 3 years ago

"More multi-chip processor designs" != "single-chip processors have reached their limits".

dahfizzabout 3 years ago

Maybe we will start to optimize our software instead of expecting accelerating CPUs to eat our bloat.

Animatsabout 3 years ago

pdimitarabout 3 years ago

hinkleyabout 3 years ago

评论 #30912230 未加载

评论 #30928526 未加载

评论 #30911374 未加载

评论 #30914051 未加载

marcodiegoabout 3 years ago

Makes me remember the processor in the film terminator 2: <a href="https://gndn.files.wordpress.com/2016/04/shot00332.jpg" rel="nofollow">https://gndn.files.wordpress.com/2016/04/shot00332.jpg</a>

sargstuffabout 3 years ago

kzrdudeabout 3 years ago

It's surprising it took that many cores before the limit was reached!

ladyattisabout 3 years ago

I wonder if this will finally restart any research into non-Von Neumann architecture even for commercial uses like workstations and servers.

anonymousDanabout 3 years ago

So is UCI-e a competitor/potential successor for something like Intel's QPI (or whatever they are using now)?

truth_seekerabout 3 years ago

A chip with Semi-FPGA as well as Semi-ASIC strategy could work. FPGA dev tools chain needs to improve.