“Alright guys we already blew it on mobile. People are starting to realize that they can put ARM in their data center and save a ton on electricity. We better not blow it on this AI thing with Nvidia”
Intel has been saying all along that they would ship Nervana's chips by year end, so not really news. The news will be if they miss their deadline.
What's the key advantage of this Nervana architecture over GPUs?<p>I can theorycraft... hypothetically, GPUs have a memory architecture structured like this: (using OpenCL terms here) Global Memory <-> Local <-> Private Memory, correlating to the GPU <-> Work Group <-> Work Item.<p>On AMD's systems, "Local" and/or Work Group tier is a group of roughly 256-work items (or in CUDA Terms, a 256-"Block" of "Threads") and all 256-work items can access Local Memory at outstanding speeds (and then there's even faster "Private" memory per work item / thread, which is basically a hardware register).<p>On say a Vega 64, there are 64-compute units (each of which has 256 work items running in parallel). The typical way Compute Unit 1 can talk to Compute Unit 2 is to write data from CU1 into Global Memory (which is off-chip), and then read it back in in CU2. In effect, GPUs are designed for high-bandwidth communications WITHIN a Workgroup (or "Warp" in CUDA terms), but they have slow communications ACROSS Work-groups / Warps.<p>In effect, there's only "one" Global Memory on a GPU. And in the case of a Vega64 GPU, that's 16384 work items that might be trying to hit Global Memory at the same time. True, there's caching layers and other optimizations, but any methodology based on Global resources will naturally slow down code and hamper parallelism.<p>Neural Networks possibly can have faster memory message passing if there were a different architecture. Imagine if the compute-units allowed quick communication in a torus for example. So compute unit #1 can quickly communicate to compute unit #2.<p>This would roughly correlate to "Layer1 Neurons" passing signals to "Layer2 Neurons" and vice versa (say for backpropagation of errors).<p>Alas, I don't see much information on what Nervana is doing differently. When "Parallela" came out a few years ago, they were crystal clear on how their memory architecture was grossly different than a GPU... it'd be nice if Nervana's marketing material was similarly clear.<p>----------<p>Hmm, this page is a bit more technical: <a href="https://www.intelnervana.com/intel-nervana-neural-network-processors-nnp-redefine-ai-silicon/" rel="nofollow">https://www.intelnervana.com/intel-nervana-neural-network-pr...</a><p>It seems like the big selling points are:<p>* "Flexpoint" -- They're a bit light on the details, but they argue that "Flexpoint" is better than Floating Point. It'd be nice if they were a bit more transparent on what "Flexpoint" is, but I'll imagine that its like a Logarithmic Number System (<a href="https://en.wikipedia.org/wiki/Logarithmic_number_system" rel="nofollow">https://en.wikipedia.org/wiki/Logarithmic_number_system</a>) or similar, which probably would be better for low-precision Neural Network computations.<p>* "Better Memory Architecture" -- I can't find any details on why their memory architecture is better. They just sorta... claim its better.<p>Ultimately, GPUs were designed for graphics problems. So I'm sure there's a better architecture out there for Neural Network problems. Its just fundamentally a different kind of parallelism. (Image processing / shaders handling the top-left corner of a polygon don't need to know what's going on on the bottom-right polygon. So GPUs don't have high-bandwidth communication lines between those units. Neural Networks require a little bit more communication than image processing problems did from the past). But I'm not really seeing "why" this architecture is better yet.