Just for some perspective, a little over 10 years ago, this $130k turnkey installation would sit at #1 in TOP500, easily beating out hundred-million-dollar initiatives like NEC's Earth Simulator and IBM's BlueGene/L: <a href="http://www.top500.org/lists/2005/06/" rel="nofollow">http://www.top500.org/lists/2005/06/</a> (170 TFLOPS vs. 137 TFLOPS)<p>At the other end, even a single GTX 960 would make it onto the list, placing in the 200s.
Check out the specs here: <a href="http://images.nvidia.com/content/technologies/deep-learning/pdf/61681-DB2-Launch-Datasheet-Deep-Learning-Letter-WEB.pdf" rel="nofollow">http://images.nvidia.com/content/technologies/deep-learning/...</a><p>though I'm most curious about what motherboard is in there to support NVLink and NVHS.<p>Good overview of Pascal here: <a href="https://devblogs.nvidia.com/parallelforall/inside-pascal/" rel="nofollow">https://devblogs.nvidia.com/parallelforall/inside-pascal/</a><p>1 question: will we see NVLink become an open standard for use in/with other coprocessors?<p>1 gripe: they give relative performance data as compared to a CPU -- of <i>course</i> its faster than a CPU
I am looking forward to OpenCL catching up with CUDA in maturity and adoption, so that NVidia's monopoly in Silicon for deep learning will come to an end.
More detail on the GPUs in the system:<p><a href="https://devblogs.nvidia.com/parallelforall/inside-pascal/" rel="nofollow">https://devblogs.nvidia.com/parallelforall/inside-pascal/</a>
Note the P100 is 20 Tflops for half precision (16 bit). For general purpose GPU (I use them for EM simulation) I assume one would want 32-bit, which is 10 Tflops. But still looks much much better for 64-bit computations than the previous generation
I have to wonder about intel and their Xeon Phi range. Last I checked they were supposed to launch a followup late last year that never manifested. Now we're 4 months in 2016 and still no new phi's.<p>Couple that with the fact that they want you to use their compilers (extremely expensive), on a specialized system that can support the card, and you get a platform that nobody other than supercomputer companies can reasonably use. Meanwhile any developer who want to try something with cuda can drop $200 dollars on a GPU and go, then scale accordingly. I think intel somewhat acknowledged this by having a firesale on phi cards and dev licenses last year but it was only for a passively cooled model (really only works well in servers, not workstations).<p>Intel do this:<p><pre><code> - Offer a $200-400 XEON PHI CARD
- Include whatever compiler needed to use it with the card
- Make this easily buyable
- Contribute ports of Cuda-based frameworks over to Xeon Phi
</code></pre>
I feel like they could do this pretty easily, even if it lost money, it's pennies compared to what they're going to lose if nvidia keeps trumping them on machine learning. They need to give dev's the tooling and financial incentive to write something for Phi instead of cuda, right now it completely doesn't exist and frameworks basically use Cuda by default.<p>If you're AMD, do the same thing but replace the phrase Xeon Phi with Radeon/Firepro
$129k for this machine. In the keynote its interesting that they mentioned the product line being: "Tesla M40 for hyperscale, K80 for multi-app HPC, P100 for scales very high, and DGX-1 for the early adopters".<p>The GP100/P100 with the 16nm process probably gives a considerable performance/power advantage over the Tesla... but this gives me the feeling that we may not see consumer or workstation-level Pascal boards for a while.
This announcement reminds me of the part of Outliers that spelled out how Bill Gates and others became who they are because they had access to very expensive equipment before anyone else did (and spent 10K hours on it).
How does this compare to some of the systems provided by cloud providers? Seems like requiring an on-site capability is a hurdle for integration if you already have your data on a cloud provider.<p>[1] <a href="https://aws.amazon.com/machine-learning/" rel="nofollow">https://aws.amazon.com/machine-learning/</a>
[2] <a href="https://azure.microsoft.com/en-us/services/machine-learning/" rel="nofollow">https://azure.microsoft.com/en-us/services/machine-learning/</a>
It's good to see powerful machine learning hardware come out. Much of the progress in ML has come from hardware speedup. It will empower the next years of research.
I wonder how much faster the new Tesla P100 is compared to the Tesla K40 in training neural networks. The K40s were the best available GPUs for training deep neural networks.
Does anyone know if the Pascal architecture is built using stacked cores? Or is this one of those applications where thermal problems keep that technique from being used?
Looks like a research in machine learning will only be done in huge corporations. You'll need an amount of funding comparable to LHC.<p>Time to use better models like kernel ensembles, maybe they are not that accurate, but they are easier to train on a single CPU.