TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Cerebras’s giant chip will smash deep learning’s speed barrier

159 pointsby pheme1over 5 years ago

13 comments

geomarkover 5 years ago
The article talks about a few things that they call inventions, like making interconnections across what would normally be scribe lines. But I personally worked on wafer scale integration about 25 years ago and we were already doing that. We called it inter-reticle stitching. The technology was ancient back then - 0.5 micron feature size on 4 inch wafers - but the wafer scale techniques are applicable to modern technologies. In particular, developing a yield model that informs your on-chip redundancy choices and designing built-in self test and selection circuitry so that you can yield large chips. The chip we developed was so large that only two would fit on a wafer. We got 50% yield on a line that was far from mature at the time. The company lacked the vision to do anything with what they had developed. To them it was just a chip for which there were few customers. The suits didn&#x27;t know how to make bank with this methodology that could yield nearly arbitrarily complex chips in nearly any target process.<p>Edit: There were a number of papers and conference proceedings published back then but not much shows up when searching Google. Here&#x27;s one discussing the issues and results of field stitching <a href="https:&#x2F;&#x2F;fdocuments.in&#x2F;document&#x2F;ieee-comput-soc-press-1992-international-conference-on-wafer-scale-integration-589dfe172703a.html" rel="nofollow">https:&#x2F;&#x2F;fdocuments.in&#x2F;document&#x2F;ieee-comput-soc-press-1992-in...</a><p>From 1992, so yeah, field stitching is not a recent invention.
评论 #21943383 未加载
mark_l_watsonover 5 years ago
I don’t know if this mega-chip will be successful, but I like the idea. Before I retired I managed a deep learning team that had a very cool internal product for running distributed TensorFlow. Now in retirement I get by with a single 1070 GPU for experiments - not bad but having something much cheaper, much more memory, and much faster would help so much.<p>I tend to be optimistic, so take my prediction with a grain of salt: I bet within 7 or 8 years there will be an inexpensive device that will blow away what we have now. There are so many applications for much larger end to end models that will but pressure on the market for something much better than what we have now. BTW, the ability to efficiently run models on my new iPhone 11 Pro is impressive and I have to wonder if the market for super fast hardware for training models might match the smartphone market. For this to happen, we need a deep learning rules the world shift. BTW, off topic, but I don’t think deep learning gets us to AGI.
评论 #21940237 未加载
评论 #21944191 未加载
评论 #21945434 未加载
评论 #21940658 未加载
评论 #21942303 未加载
评论 #21940406 未加载
评论 #21960880 未加载
varelseover 5 years ago
I am far more excited by the underlying Wafer Scale Integration moonshot than I am by any AI benchmarks here. I know it&#x27;s trendy to think there can only be one w&#x2F;r to the AI Iron Throne but nope, not the case, everyone is writing bespoke code in production where the money is made. Well, almost everyone, Amazon seems to be the odd duck but they&#x27;re a bunch of cheapskate thought leaders anyway (except for their offers to junior engineers in their desperate hail mary attempt to catch up with FAIR and DeepMind, but... I... digress...).<p>Which is to say that graphs written to run specifically on Cerebras&#x27;s giant chip will smash deep learning&#x27;s speed barrier for graphs written to run best on Cerebras&#x27;s giant chip. And that&#x27;s great, but it won&#x27;t be every graph, there is no free lunch. Hear me now, believe me later(tm).<p>But if we can cut the cost of interconnect by putting a figurative datacenter&#x27;s worth of processors on a chip, that&#x27;s genuinely interesting, and it has applications far beyond the multiplies and adds of AI. But be very wary of anyone wielding the term &quot;sparse&quot; for it is a massively overloaded definition and every single one of those definitions is a beautiful and unique snowflake w&#x2F;r to efficient execution on bespoke HW.
评论 #21942280 未加载
评论 #21941627 未加载
评论 #21941971 未加载
bcatanzaroover 5 years ago
Reminds me of that other great prediction of a GPU killer from IEEE Spectrum back in 2009:<p><a href="https:&#x2F;&#x2F;spectrum.ieee.org&#x2F;computing&#x2F;software&#x2F;winner-multicore-made-simple" rel="nofollow">https:&#x2F;&#x2F;spectrum.ieee.org&#x2F;computing&#x2F;software&#x2F;winner-multicor...</a>
评论 #21939192 未加载
评论 #21942337 未加载
Zenstover 5 years ago
A chip that size, imagine the yield. Equally, cooling - has to be water based as a heatsink that size would be on par to a small anvil and the weight factor would be some serious issues. Though unsure as no pictures of it in-play alas and all they say is - &quot;20 kilowatts being consumed by each blew out into the Silicon Valley streets through a hole cut into the wall&quot;, which does somewhat beg for a picture as just raises more questions.<p>Why would they make a chip this big with AMD showing a chiplet design approach is cheaper and more scalable on so many levels. Let alone, yields.<p>Equally, arms approach to utilising the back of the chip as a power delivery :- <a href="https:&#x2F;&#x2F;spectrum.ieee.org&#x2F;nanoclast&#x2F;semiconductors&#x2F;design&#x2F;arm-shows-backside-power-delivery-as-path-to-further-moores-law" rel="nofollow">https:&#x2F;&#x2F;spectrum.ieee.org&#x2F;nanoclast&#x2F;semiconductors&#x2F;design&#x2F;ar...</a><p>Then a wafer scale chip like this, using that approach, would save so much power. But again, yeilds will be a factor and can imagine this is not the cutting edge process node as you find as nodes mature, the yields improve. So an older node size would have a better yield and be more suitable for such wafer scale chips. But again, no mention of what is used. I have read in the past that it would use Intel&#x27;s 10nm, but this article mentions TSMC. Another article I read that they used a 16nm node ( <a href="https:&#x2F;&#x2F;fuse.wikichip.org&#x2F;news&#x2F;3010&#x2F;a-look-at-cerebras-wafer-scale-engine-half-square-foot-silicon-chip&#x2F;" rel="nofollow">https:&#x2F;&#x2F;fuse.wikichip.org&#x2F;news&#x2F;3010&#x2F;a-look-at-cerebras-wafer...</a> ), which as mentioned above about node maturity, understandable.
评论 #21938243 未加载
评论 #21938100 未加载
评论 #21940637 未加载
评论 #21939718 未加载
评论 #21938536 未加载
评论 #21938912 未加载
评论 #21939635 未加载
评论 #21938508 未加载
评论 #21938362 未加载
wbhartover 5 years ago
From the perspective of an outsider, I can&#x27;t see how a company like this could survive. They claim on the one hand to have done something really amazing and are at the stage where they are looking for customers. Normally, you&#x27;d expect them to be touting performance figures to secure such investment. Instead, they&#x27;ve decided to keep the performance secret. And they&#x27;ve managed to find some &quot;expert&quot; who says this is normal.<p>Does anyone here have expertise in this area? Is this the model for a successful company in this area?
评论 #21939274 未加载
评论 #21939320 未加载
评论 #21939083 未加载
评论 #21939095 未加载
评论 #21940097 未加载
评论 #21939034 未加载
michelppover 5 years ago
The members of the GraphBLAS forum have discussed this chip a couple of times. There&#x27;s a lot of research on making deep neural networks more sparse, not just by pruning a dense matrix, but by starting with a sparse matrix structure de novo. Lincoln Laboratory&#x27;s Dr. Jeremy Kepner has a good paper on Radix-Net mixed radix topologies that achieve good learning ability but with far fewer neurons and memory requirements. Cited in the paper was a network constructed with these techniques that simulated the size and sparsity of the human brain:<p><a href="https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;1905.00416.pdf" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;1905.00416.pdf</a><p>It would be cool to see the GraphBLAS API ported to this chip, which from what I can tell comes with sparse matrix processing units. As networks become bigger, deeper, but sparser, a chip like this will have some demonstrable advantages over dense numeric processors like GPUs.
giacagliaover 5 years ago
I&#x27;ve wrote about the challenges that Cerebras went through and what is next: <a href="https:&#x2F;&#x2F;towardsdatascience.com&#x2F;why-cerebras-announcement-is-a-big-deal-6c8633ffc49c" rel="nofollow">https:&#x2F;&#x2F;towardsdatascience.com&#x2F;why-cerebras-announcement-is-...</a>
rsp1984over 5 years ago
This fits perfectly into the narrative of yesterday&#x27;s discussion on HN [1].<p>Deep Neural Nets are somewhat of a brute force approach to machine learning. Training efficiency is horrible as compared with other ML approaches, but hey, as long as we can trade +5% of classification performance for +500% of NN complexity and throw more money at the problem, who cares?<p>I see a dystopian future where much better and much more efficient approaches to ML exist, but nobody&#x27;s paying attention because we have Deep Neural Nets in hardware and decades of infrastructure supporting it.<p>[1] <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=21929709" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=21929709</a>
评论 #21940477 未加载
m0zgover 5 years ago
They did build some valuable tech, no question there, but be sure to account for the typical startup hyperbole. By the time you can get your hands on this (if that ever happens), the hyperbole will converge a bit closer to reality, the tradeoffs will become apparent, etc, and you&#x27;ll discover that it is not, in fact, going to &quot;smash&quot; barriers of any kind in any practical sense.<p>From TFA: &quot;Cerebras hasn’t released MLPerf results or any other independently verifiable apples-to-apples comparisons.&quot;<p>That&#x27;s all you really need to know.
ZhuanXiaover 5 years ago
Them shunning benchmarks is pretty lame.
评论 #21940866 未加载
green-eclipseover 5 years ago
The Cerebras chip really stands out in terms of the chip industry&#x27;s relationship to Moore&#x27;s law. Look at the graphs in this article for reference:<p><a href="https:&#x2F;&#x2F;medium.com&#x2F;predict&#x2F;cerebras-trounces-moores-law-with-first-working-wafer-scale-chip-70b712d676d0" rel="nofollow">https:&#x2F;&#x2F;medium.com&#x2F;predict&#x2F;cerebras-trounces-moores-law-with...</a>
评论 #21938805 未加载
评论 #21938774 未加载
gfodorover 5 years ago
I’m a know-nothing when it comes to this area, but I shouted expletives at least twice when I read this article. This is crazy.