TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Intel Shipping Nervana Neural Network Processor First Silicon Before Year End

103 点作者 kartD超过 7 年前

5 条评论

omarforgotpwd超过 7 年前
“Alright guys we already blew it on mobile. People are starting to realize that they can put ARM in their data center and save a ton on electricity. We better not blow it on this AI thing with Nvidia”
评论 #15502137 未加载
kbumsik超过 7 年前
I hope Intel would try to integrate this with existing ML libraries like Tensorflow and Caffe, rather than to make their own separated ecosystems.
评论 #15500878 未加载
评论 #15500889 未加载
vonnik超过 7 年前
Intel has been saying all along that they would ship Nervana's chips by year end, so not really news. The news will be if they miss their deadline.
dragontamer超过 7 年前
What&#x27;s the key advantage of this Nervana architecture over GPUs?<p>I can theorycraft... hypothetically, GPUs have a memory architecture structured like this: (using OpenCL terms here) Global Memory &lt;-&gt; Local &lt;-&gt; Private Memory, correlating to the GPU &lt;-&gt; Work Group &lt;-&gt; Work Item.<p>On AMD&#x27;s systems, &quot;Local&quot; and&#x2F;or Work Group tier is a group of roughly 256-work items (or in CUDA Terms, a 256-&quot;Block&quot; of &quot;Threads&quot;) and all 256-work items can access Local Memory at outstanding speeds (and then there&#x27;s even faster &quot;Private&quot; memory per work item &#x2F; thread, which is basically a hardware register).<p>On say a Vega 64, there are 64-compute units (each of which has 256 work items running in parallel). The typical way Compute Unit 1 can talk to Compute Unit 2 is to write data from CU1 into Global Memory (which is off-chip), and then read it back in in CU2. In effect, GPUs are designed for high-bandwidth communications WITHIN a Workgroup (or &quot;Warp&quot; in CUDA terms), but they have slow communications ACROSS Work-groups &#x2F; Warps.<p>In effect, there&#x27;s only &quot;one&quot; Global Memory on a GPU. And in the case of a Vega64 GPU, that&#x27;s 16384 work items that might be trying to hit Global Memory at the same time. True, there&#x27;s caching layers and other optimizations, but any methodology based on Global resources will naturally slow down code and hamper parallelism.<p>Neural Networks possibly can have faster memory message passing if there were a different architecture. Imagine if the compute-units allowed quick communication in a torus for example. So compute unit #1 can quickly communicate to compute unit #2.<p>This would roughly correlate to &quot;Layer1 Neurons&quot; passing signals to &quot;Layer2 Neurons&quot; and vice versa (say for backpropagation of errors).<p>Alas, I don&#x27;t see much information on what Nervana is doing differently. When &quot;Parallela&quot; came out a few years ago, they were crystal clear on how their memory architecture was grossly different than a GPU... it&#x27;d be nice if Nervana&#x27;s marketing material was similarly clear.<p>----------<p>Hmm, this page is a bit more technical: <a href="https:&#x2F;&#x2F;www.intelnervana.com&#x2F;intel-nervana-neural-network-processors-nnp-redefine-ai-silicon&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.intelnervana.com&#x2F;intel-nervana-neural-network-pr...</a><p>It seems like the big selling points are:<p>* &quot;Flexpoint&quot; -- They&#x27;re a bit light on the details, but they argue that &quot;Flexpoint&quot; is better than Floating Point. It&#x27;d be nice if they were a bit more transparent on what &quot;Flexpoint&quot; is, but I&#x27;ll imagine that its like a Logarithmic Number System (<a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Logarithmic_number_system" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Logarithmic_number_system</a>) or similar, which probably would be better for low-precision Neural Network computations.<p>* &quot;Better Memory Architecture&quot; -- I can&#x27;t find any details on why their memory architecture is better. They just sorta... claim its better.<p>Ultimately, GPUs were designed for graphics problems. So I&#x27;m sure there&#x27;s a better architecture out there for Neural Network problems. Its just fundamentally a different kind of parallelism. (Image processing &#x2F; shaders handling the top-left corner of a polygon don&#x27;t need to know what&#x27;s going on on the bottom-right polygon. So GPUs don&#x27;t have high-bandwidth communication lines between those units. Neural Networks require a little bit more communication than image processing problems did from the past). But I&#x27;m not really seeing &quot;why&quot; this architecture is better yet.
评论 #15503210 未加载
评论 #15503221 未加载
novaRom超过 7 年前
Unfortunately for Intel, it is probably too late. The specs of new processors are not impressing even in comparison with previous Tesla generation.
评论 #15501059 未加载
评论 #15501129 未加载