科技回声

9 条评论

maldeh大约 8 年前

From first looks, there is little doubt that that NVIDIA's Volta architecture is a monster and will revolutionize the AI and HPC market. But the article seems to avoid quantifying how 16-point FP operations are beneficial against 32- or 64-bit FP operations in real-world usecases, or how the Caffe2 / NVIDIA architecture provides any significant boost to FP16 in particular, especially apropos to images (or why FP16 is better for image data in general).I'm interested more in understanding why Caffe2 would outperform Theano, Tensorflow, MXNet, etc. once Volta chipsets are generally available, beyond early pre-release optimization, particularly when most of the front-runners are already leveraging / taking into account NCCL, CuDNN, NVLink, etc. When the burden of adding support for new NVIDIA primitives is so low, what gives Caffe2 an advantage beyond an ephemeral "we were partners with NVIDIA first" one-up that would last for a couple of months at most?(Apologies in advance if this post sounds overly negative, but I am constantly evaluating the current crop of frameworks for the trade-offs they enforce on the problem space, and a definitive answer would be very helpful.)

评论 #14322313 未加载

评论 #14322959 未加载

评论 #14325904 未加载

评论 #14321632 未加载

评论 #14323636 未加载

评论 #14322170 未加载

评论 #14323525 未加载

DocSavage大约 8 年前

The AnandTech article on the Volta has a lot more information on the new architecture: <a href="http://www.anandtech.com/show/11367/nvidia-volta-unveiled-gv100-gpu-and-tesla-v100-accelerator-announced" rel="nofollow">http://www.anandtech.com/show/11367/nvidia-volta-unveiled-gv...</a>It's interesting the speed up isn't more pronounced between Volta and Pascal considering the Tensor cores on paper give you about 6x the MFlops. The price differential looks large.From AnandTech: "By the numbers, Tesla V100 is slated to provide 15 TFLOPS of FP32 performance, 30 TFLOPS FP16, 7.5 TFLOPS FP64, and a whopping 120 TFLOPS of dedicated Tensor operations. With a peak clockspeed of 1455MHz, this marks a 42% increase in theoretical FLOPS for the CUDA cores at all size. Whereas coming from Pascal, for Tensor operations the gains will be closer to 6-12x, depending on the operation precision."

评论 #14324894 未加载

mappu大约 8 年前

Are numbers available for FP16 on P100 or FP32 on V100? It would make for a more direct comparison.EDIT: Nvidia's advertised TFLOPS are:<pre><code> FP16 FP32 FP64 V100 30 15 8.5 P100 21.2 10.6 5.3 K40 4.29 4.29 1.43</code></pre>

评论 #14321712 未加载

评论 #14321771 未加载

jakebasile大约 8 年前

The GPU looks like a monster, and I am sure it'll deliver more power to AI applications, but what I really want to know is when I can put one in my gaming PC.I think it's great that what started as a specialist gaming device is now being used in industry for Big Things. The development cost that Nvidia et al. have invested in new designs has undoubtedly been financed in (large) part by the gaming community. Now income and advancements for both sectors feed into the other and gamers like me are reaping the benefits with reduced price:performance across the range.

评论 #14321810 未加载

评论 #14325522 未加载

iamNumber4大约 8 年前

I'm a long time Nvidia user. However, the recent article headlines have been a word salad.The new tesla volta super flip flop at 1.21 gigawatts blah blah blah.Just saying.Also Kudos to Nvidia for the buzzword/made up word creation for their products.

评论 #14323386 未加载

throwaway87213大约 8 年前

I can see why they launched 1080 Ti early. Had I not seen this I'd definitely not be waiting for Volta.How are things in the red camp ? There was some HIP thing where Fiji was as good as Pascal.

评论 #14322889 未加载

评论 #14323430 未加载

评论 #14322958 未加载

评论 #14322234 未加载

BugsJustFindMe大约 8 年前

I understand the pedigree of these cards, but at what point do we stop calling them GPUs and start calling them something else? MPU? TPU? I don't know, but isn't it a little bit weird to keep using the word "graphics" for something that is being made more and more specifically for other things?

评论 #14322146 未加载

评论 #14322783 未加载

评论 #14323516 未加载

TekMol大约 8 年前

Can the Volta architecture be used to run WebGL in a Browser?

Aliyekta大约 8 年前

cudnn 7?

9 条评论

maldeh大约 8 年前

评论 #14322313 未加载

评论 #14322959 未加载

评论 #14325904 未加载

评论 #14321632 未加载

评论 #14323636 未加载

评论 #14322170 未加载

评论 #14323525 未加载

DocSavage大约 8 年前

评论 #14324894 未加载

mappu大约 8 年前

评论 #14321712 未加载

评论 #14321771 未加载

jakebasile大约 8 年前

评论 #14321810 未加载

评论 #14325522 未加载

iamNumber4大约 8 年前

评论 #14323386 未加载

throwaway87213大约 8 年前

I can see why they launched 1080 Ti early. Had I not seen this I'd definitely not be waiting for Volta.How are things in the red camp ? There was some HIP thing where Fiji was as good as Pascal.

Caffe2 adds 16 bit floating point training support on the NVIDIA Volta platform

9 条评论

Caffe2 adds 16 bit floating point training support on the NVIDIA Volta platform

9 条评论