Despite the limitations apparently present in single chip/CPU systems, they can still provide an insane amount of performance if used properly.<p>There are also many problems that are literally impossible to make faster or more correct than by simply running them on a single thread/processor/core/etc. There always will be forever and ever. This is not a "we lack the innovation" problem. It's an information-theoretic / causality problem you can demonstrate with actual math & physics. Does a future event's processing circumstances maybe depend on all events received up until now? If yes, congratulations. You now have a total ordering problem just like pretty much everyone else. Yes, you can cheat and say "well these pieces here and here dont have a hard dependency on each other", but its incredibly hard to get this shit right if you decide to go down that path.<p>The most fundamental demon present in any distributed system is latency. The difference between L1 and a network hop in the same datacenter can add up very quickly.<p>Again, for many classes of problems, there is simply no handwaving this away. You either wait the requisite # of microseconds for the synchronous ack to come back, or you hope your business doesnt care if john doe gets duplicated a few times in the database on a totally random basis.
The best chiplet interconnect may turn out to be no interconnect at all. Wafer scale integration [1] has come up periodically over the years. In short, just make a physically larger integrated circuit, potentially as large as the entire wafer -- like a foot across. As I understand it, there's no particular technical hurdle, and indeed the progress with self-healing and self-testing designs with redundancy to improve yield for small processors, also makes really large designs more feasible than in the past. The economics never worked out in the favour of this approach before, but now we're at the scaling limit maybe that will change.<p>At least one company is pursuing this at the very high end. The Cerebras WFE-2 [2] ("wafer scale engine") has 2.6 trillion transistors with 800,000 cores and 48 gigabytes of RAM, on a single, giant, integrated circuit (shown in the linked article). I'm just an interested follower of the field, no expert, so what do I know. But I think that we may see a shift in that direction eventually. Everything on-die with a really big die. System on a chip, but for the high end, not just tiny microcontrollers.<p>[1] <a href="https://en.wikipedia.org/wiki/Wafer-scale_integration" rel="nofollow">https://en.wikipedia.org/wiki/Wafer-scale_integration</a><p>[2] <a href="https://www.zdnet.com/article/cerebras-continues-absolute-domination-of-high-end-compute-it-says-with-worlds-hugest-chip-two-dot-oh/" rel="nofollow">https://www.zdnet.com/article/cerebras-continues-absolute-do...</a>
I remember back in the 80's the limit was considered to be 64K RAM chips, because otherwise the defect rate would kill the yield.<p>Of course, there's always the "make a 4 core chip. If one core doesn't work, sell it as a 3 core chip. And so on."
"Reached their limits" - I feel like I've heard this many many times before.<p>Not that I doubt it, but just I've also been impressed with the ingenuity that folks come up with in this space.
The M1 Ultra is fabricated as a single chip. The 12900K is fabricated as a single chip and is still a quarter the size of the M1 Ultra. Ryzen 3 puts 8 cores on a CCX instead of four because DDR memory controllers don't have infinite memory bandwidth (contrary to AMD's wishful nomenclature) and make shitty interconnects between banks of L3.<p>Chiplets are valid strategies that are going to be used in the future but there are still more tricks that CPU makers have up their sleeves that they need to use out of necessity. They're nowhere near their limits.
Some older stuff for reference: IBM POWER5 and POWER5+ (2004&2005) are MCM designs, had 2-4 CPU chips plus cache chips in same package.<p>Link: <a href="https://en.wikipedia.org/wiki/POWER5" rel="nofollow">https://en.wikipedia.org/wiki/POWER5</a>
> UCIe is a start, but the standard’s future remains to be seen. “The founding members of initial UCIe promoters represent an impressive list of contributors across a broad range of technology design and manufacturing areas, including the HPC ecosystem,” said Nossokoff, “but a number of major organizations have not as yet joined, including Apple, AWS, Broadcom, IBM, NVIDIA, other silicon foundries, and memory vendors.”<p>The fact that the standard doesn’t include anyone who is actually building chips makes me very pessimistic about it.
Is spectrum.ieee.org becoming another mainstream ( so to speak ) journalism where everything is dumbed down to basically Newspeak. The article is poorly written, the content is shallow and the headline is click bait.
I'm embarrassed to admit I still don't quite understand what a chiplet is, would be very grateful for your input here.<p>If a thread can run on multiple chiplets then this is awesome and seems like a solution.<p>If one thread == one chiplet, then*:<p>- a chiplet is equivalent to a core, except with speedier connections to other cores?<p>- this isn't a solution, we're 15 years into cores and single-threaded performance is still king. If separating work into separate threads was a solution, cores would work more or less just fine.**<p>* put "in my totally uneducated opinion, it seems like..." before each of these, internet doesn't communicate tone well and I'm definitely not trying to pass judgement here, I don't know what I'm talking about!<p>** generally, for consumer hardware and use cases, i.e. "I am buying a new laptop and I want it to go brrrr", all sorts of caveats there of course
I hope somebody with relevant knowledge can answer this question, please: what % of the costs is "physical cost per unit" and what % is maintaining the I+D, factories, channels...?<p>In other words, if a chip with 100x size (100x gates, etc.) made sense, would it cost 100x to produce or just 10x or just 2x?<p>Edit: providing there wouldn't be additional design costs, just stacking current tech.
Is this a solution to a yield problem? Making physically bigger dies is no problem. Wafers are much larger than the individual dies. If the dies are just being laid out flat, there's no density gain.<p>Multi-chip modules are nothing new. They've been used mostly when either there was a yield problem, or you wanted two different fab technologies. The latter is seen in some imagers and radars.
What we need much more from here on are deterministic processors. There are awesome optimizers out there, including programs with genetic algorithms that find the best way to do a micro task X or Y (part of a much bigger program).<p>IMO we have a ton of slightly-higher-hanging fruit that we can pick in terms of optimization but the relentless march of the X86 / X64 architecture obstructed that innovation.<p>Might be time to look inwards and start working more on squeezing the CPUs we have right now for maximum performance.
I hope we are going to get back to a more asymmetric multi-processing arrangement in the near term where we abandon the fiction of a processor or two running the whole show with peripheral systems that have as little smarts as possible and promote them to at least second class citizens.<p>These systems are much more powerful than when these abstractions were laid down, and at this point it feels like the difference between redundant storage on the box versus three feet away is more academic than anything else.
Makes me remember the processor in the film terminator 2: <a href="https://gndn.files.wordpress.com/2016/04/shot00332.jpg" rel="nofollow">https://gndn.files.wordpress.com/2016/04/shot00332.jpg</a>
Not if C20 spec has anything to do with it.
Program as OS, assign case selections in switch statement to threads (or distributed to other 'single-chip' cpu's.
boundless networked single ASIC chip possibilities!
"Single-Chip Processors Have Reached Their Limits<p>Announcements from XYZ and ABC prove that chiplets are the future, but interconnects remain a battleground"<p>This could easily have been written 10 years ago, and I bet someone will write it in 10 years again.<p>We need these really big chips with their big powerful cores because the nature the computing we do only changes very slowly towards being distributed and parallelizable and thus able to use a massive number of smaller but far more efficient cores.