TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Intel Shipping Nervana Neural Network Processor First Silicon Before Year End

103 pointsby kartDover 7 years ago

5 comments

omarforgotpwdover 7 years ago
“Alright guys we already blew it on mobile. People are starting to realize that they can put ARM in their data center and save a ton on electricity. We better not blow it on this AI thing with Nvidia”
评论 #15502137 未加载
kbumsikover 7 years ago
I hope Intel would try to integrate this with existing ML libraries like Tensorflow and Caffe, rather than to make their own separated ecosystems.
评论 #15500878 未加载
评论 #15500889 未加载
vonnikover 7 years ago
Intel has been saying all along that they would ship Nervana's chips by year end, so not really news. The news will be if they miss their deadline.
dragontamerover 7 years ago
What&#x27;s the key advantage of this Nervana architecture over GPUs?<p>I can theorycraft... hypothetically, GPUs have a memory architecture structured like this: (using OpenCL terms here) Global Memory &lt;-&gt; Local &lt;-&gt; Private Memory, correlating to the GPU &lt;-&gt; Work Group &lt;-&gt; Work Item.<p>On AMD&#x27;s systems, &quot;Local&quot; and&#x2F;or Work Group tier is a group of roughly 256-work items (or in CUDA Terms, a 256-&quot;Block&quot; of &quot;Threads&quot;) and all 256-work items can access Local Memory at outstanding speeds (and then there&#x27;s even faster &quot;Private&quot; memory per work item &#x2F; thread, which is basically a hardware register).<p>On say a Vega 64, there are 64-compute units (each of which has 256 work items running in parallel). The typical way Compute Unit 1 can talk to Compute Unit 2 is to write data from CU1 into Global Memory (which is off-chip), and then read it back in in CU2. In effect, GPUs are designed for high-bandwidth communications WITHIN a Workgroup (or &quot;Warp&quot; in CUDA terms), but they have slow communications ACROSS Work-groups &#x2F; Warps.<p>In effect, there&#x27;s only &quot;one&quot; Global Memory on a GPU. And in the case of a Vega64 GPU, that&#x27;s 16384 work items that might be trying to hit Global Memory at the same time. True, there&#x27;s caching layers and other optimizations, but any methodology based on Global resources will naturally slow down code and hamper parallelism.<p>Neural Networks possibly can have faster memory message passing if there were a different architecture. Imagine if the compute-units allowed quick communication in a torus for example. So compute unit #1 can quickly communicate to compute unit #2.<p>This would roughly correlate to &quot;Layer1 Neurons&quot; passing signals to &quot;Layer2 Neurons&quot; and vice versa (say for backpropagation of errors).<p>Alas, I don&#x27;t see much information on what Nervana is doing differently. When &quot;Parallela&quot; came out a few years ago, they were crystal clear on how their memory architecture was grossly different than a GPU... it&#x27;d be nice if Nervana&#x27;s marketing material was similarly clear.<p>----------<p>Hmm, this page is a bit more technical: <a href="https:&#x2F;&#x2F;www.intelnervana.com&#x2F;intel-nervana-neural-network-processors-nnp-redefine-ai-silicon&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.intelnervana.com&#x2F;intel-nervana-neural-network-pr...</a><p>It seems like the big selling points are:<p>* &quot;Flexpoint&quot; -- They&#x27;re a bit light on the details, but they argue that &quot;Flexpoint&quot; is better than Floating Point. It&#x27;d be nice if they were a bit more transparent on what &quot;Flexpoint&quot; is, but I&#x27;ll imagine that its like a Logarithmic Number System (<a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Logarithmic_number_system" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Logarithmic_number_system</a>) or similar, which probably would be better for low-precision Neural Network computations.<p>* &quot;Better Memory Architecture&quot; -- I can&#x27;t find any details on why their memory architecture is better. They just sorta... claim its better.<p>Ultimately, GPUs were designed for graphics problems. So I&#x27;m sure there&#x27;s a better architecture out there for Neural Network problems. Its just fundamentally a different kind of parallelism. (Image processing &#x2F; shaders handling the top-left corner of a polygon don&#x27;t need to know what&#x27;s going on on the bottom-right polygon. So GPUs don&#x27;t have high-bandwidth communication lines between those units. Neural Networks require a little bit more communication than image processing problems did from the past). But I&#x27;m not really seeing &quot;why&quot; this architecture is better yet.
评论 #15503210 未加载
评论 #15503221 未加载
novaRomover 7 years ago
Unfortunately for Intel, it is probably too late. The specs of new processors are not impressing even in comparison with previous Tesla generation.
评论 #15501059 未加载
评论 #15501129 未加载