The Nvidia DGX-1 Deep Learning Supercomputer in a Box

215 pointsby dphidtabout 9 years ago

23 comments

mortenjorckabout 9 years ago

Just for some perspective, a little over 10 years ago, this $130k turnkey installation would sit at #1 in TOP500, easily beating out hundred-million-dollar initiatives like NEC's Earth Simulator and IBM's BlueGene/L: <a href="http://www.top500.org/lists/2005/06/" rel="nofollow">http://www.top500.org/lists/2005/06/</a> (170 TFLOPS vs. 137 TFLOPS)At the other end, even a single GTX 960 would make it onto the list, placing in the 200s.

评论 #11436164 未加载

评论 #11436127 未加载

评论 #11436271 未加载

评论 #11447183 未加载

aconz2about 9 years ago

Check out the specs here: <a href="http://images.nvidia.com/content/technologies/deep-learning/pdf/61681-DB2-Launch-Datasheet-Deep-Learning-Letter-WEB.pdf" rel="nofollow">http://images.nvidia.com/content/technologies/deep-learning/...</a>though I'm most curious about what motherboard is in there to support NVLink and NVHS.Good overview of Pascal here: <a href="https://devblogs.nvidia.com/parallelforall/inside-pascal/" rel="nofollow">https://devblogs.nvidia.com/parallelforall/inside-pascal/</a>1 question: will we see NVLink become an open standard for use in/with other coprocessors?1 gripe: they give relative performance data as compared to a CPU -- of course its faster than a CPU

评论 #11434410 未加载

评论 #11434204 未加载

phelmabout 9 years ago

I am looking forward to OpenCL catching up with CUDA in maturity and adoption, so that NVidia's monopoly in Silicon for deep learning will come to an end.

评论 #11435205 未加载

评论 #11436792 未加载

评论 #11434809 未加载

评论 #11436171 未加载

评论 #11434571 未加载

评论 #11434787 未加载

评论 #11434951 未加载

partycoderabout 9 years ago

Costs $129,000 and needs 3.2 kilowatts to run.

评论 #11435134 未加载

评论 #11434965 未加载

jra101about 9 years ago

More detail on the GPUs in the system:<a href="https://devblogs.nvidia.com/parallelforall/inside-pascal/" rel="nofollow">https://devblogs.nvidia.com/parallelforall/inside-pascal/</a>

评论 #11434945 未加载

madengrabout 9 years ago

Note the P100 is 20 Tflops for half precision (16 bit). For general purpose GPU (I use them for EM simulation) I assume one would want 32-bit, which is 10 Tflops. But still looks much much better for 64-bit computations than the previous generation

评论 #11435116 未加载

sp332about 9 years ago

Wow, I didn't realize they were shipping HBM2 already. 720GB/s - with only 16GB of RAM, you can read it all in 22 milliseconds!

评论 #11434555 未加载

nickpetersonabout 9 years ago

I have to wonder about intel and their Xeon Phi range. Last I checked they were supposed to launch a followup late last year that never manifested. Now we're 4 months in 2016 and still no new phi's.Couple that with the fact that they want you to use their compilers (extremely expensive), on a specialized system that can support the card, and you get a platform that nobody other than supercomputer companies can reasonably use. Meanwhile any developer who want to try something with cuda can drop $200 dollars on a GPU and go, then scale accordingly. I think intel somewhat acknowledged this by having a firesale on phi cards and dev licenses last year but it was only for a passively cooled model (really only works well in servers, not workstations).Intel do this:<pre><code> - Offer a $200-400 XEON PHI CARD - Include whatever compiler needed to use it with the card - Make this easily buyable - Contribute ports of Cuda-based frameworks over to Xeon Phi </code></pre> I feel like they could do this pretty easily, even if it lost money, it's pennies compared to what they're going to lose if nvidia keeps trumping them on machine learning. They need to give dev's the tooling and financial incentive to write something for Phi instead of cuda, right now it completely doesn't exist and frameworks basically use Cuda by default.If you're AMD, do the same thing but replace the phrase Xeon Phi with Radeon/Firepro

manavabout 9 years ago

$129k for this machine. In the keynote its interesting that they mentioned the product line being: "Tesla M40 for hyperscale, K80 for multi-app HPC, P100 for scales very high, and DGX-1 for the early adopters".The GP100/P100 with the 16nm process probably gives a considerable performance/power advantage over the Tesla... but this gives me the feeling that we may not see consumer or workstation-level Pascal boards for a while.

评论 #11437551 未加载

评论 #11435229 未加载

Coding_Catabout 9 years ago

Wait, how many chips did they cram in there that they're getting 170 TFlops. Even at a very generous 10 TFLOP per chip that would be 17 chips.

评论 #11433999 未加载

intrasightabout 9 years ago

Is also fun to contemplate that in about five years you'll likely be able to buy one of these on eBay for about $10K.

dougmanyabout 9 years ago

This announcement reminds me of the part of Outliers that spelled out how Bill Gates and others became who they are because they had access to very expensive equipment before anyone else did (and spent 10K hours on it).

AndrewKemendoabout 9 years ago

How does this compare to some of the systems provided by cloud providers? Seems like requiring an on-site capability is a hurdle for integration if you already have your data on a cloud provider.[1] <a href="https://aws.amazon.com/machine-learning/" rel="nofollow">https://aws.amazon.com/machine-learning/</a> [2] <a href="https://azure.microsoft.com/en-us/services/machine-learning/" rel="nofollow">https://azure.microsoft.com/en-us/services/machine-learning/</a>

评论 #11434800 未加载

ansibleabout 9 years ago

The unified memory architecture with the Pascal GP100 is pretty sweet. That will make it easier to work with large data sets.

visargaabout 9 years ago

It's good to see powerful machine learning hardware come out. Much of the progress in ML has come from hardware speedup. It will empower the next years of research.

bpiresabout 9 years ago

I wonder how much faster the new Tesla P100 is compared to the Tesla K40 in training neural networks. The K40s were the best available GPUs for training deep neural networks.

aperrienabout 9 years ago

Does anyone know if the Pascal architecture is built using stacked cores? Or is this one of those applications where thermal problems keep that technique from being used?

评论 #11435437 未加载

pmoriciabout 9 years ago

Anyone have any idea of how the GPUs in this machine compare to the GPUs in their high end gaming products?

评论 #11437362 未加载

nshmabout 9 years ago

Looks like a research in machine learning will only be done in huge corporations. You'll need an amount of funding comparable to LHC.Time to use better models like kernel ensembles, maybe they are not that accurate, but they are easier to train on a single CPU.

评论 #11435490 未加载

评论 #11436821 未加载

caycepabout 9 years ago

Does that mean Pascal release is just around the corner?!?-unreformed box builder

chmabout 9 years ago

Any idea how much this costs?

评论 #11433876 未加载

dharma1about 9 years ago

was hoping they would announce Pascal GTX's. Oh well. Computex I guess

agumonkeyabout 9 years ago

What a peculiar pascaline.