The Looming Battle Over AI Chips

124 点作者 poster123大约 7 年前

16 条评论

oneshot908大约 7 年前

If you're FB, GOOG, AAPL, AMZN, BIDU, etc, this strategy makes sense because much like they have siloed data, they also have siloed computation graphs for which they can lovingly design artisan transistors to make the perfect craft ASIC. There's big money in this.Or you can be like BIDU, buy 100K consumer GPUs, and put them in your datacenter. In response, Jensen altered the CUDA 9.1 licensing agreement and the EULA for Titan V such that going forward, you cannot deploy Titan V in a datacenter for anything but mining cryptocurrency, and his company reserves the right going forward to audit your use of their SW and HW at any time to force compliance with whatever rules Jensen pulled out of his butt that day after his morning weed. And that's a shame. Because there's no way any of these companies can beat the !/$ of consumer GPUs and NVDA is lying out of its a$$ to say you can't do HPC on them.But beyond NVDA shenanigans, I think it's incredibly risky to second guess those siloed computation graphs from the outside in the hopes of anything but an acqui-hire for an internal effort. Things ended well for Nervana even if their HW didn't ship in time, but when I see a 2018 company (<a href="http://nearist.ai/k-nn-benchmarks-part-wikipedia" rel="nofollow">http://nearist.ai/k-nn-benchmarks-part-wikipedia</a>) comparing their unavailable powerpoint processor to GPUs from 2013, and then doubling down on doing so when someone rightly points out how stupid that is, I see a beached fail whale in the making, not a threat to NVDA's Deepopoly.

评论 #16893404 未加载

评论 #16893047 未加载

评论 #16892847 未加载

评论 #16892135 未加载

Nokinside大约 7 年前

Nvidia will almost certainly respond to this challenge with it's own specialized machine learning and inference chips. It's probably what Google, Facebook and others hope. Forcing Nvidia to work harder is enough for them.Developing a new high performance microarchitecture for GPU or CPU is complex task. A new clean sheet design architecture takes 5-7 years even for teams that have been doing it constantly for decades in Intel, AMD, ARM or Nvidia. This includes optimizing the design into process technology, yield, etc. and integrating memory architectures. Then there is economies of scale and price points.Nvidia's Volta microarchitecture design started 2013, launch was December 2017AMD's zen CPU architecture design started 2012 and CPU was out 2017.

评论 #16892261 未加载

评论 #16892991 未加载

评论 #16892204 未加载

joe_the_user大约 7 年前

Nvidia, moreover, increasingly views its software for programming its chips, called CUDA, as a kind of vast operating system that would span all of the machine learning in the world, an operating system akin to what Microsoft (MSFT) was in the old days of PCs.Yeah, nVidia throwing it's weight around in terms of requiring that data centers pay more to use cheap consumer gaming chips may turn out to backfire and certainly has an abusive-monopoly flavor to it.As I've researched the field, Cuda really seems provides considerable value to the individual programmer. But making maneuvers of this sort may show the limits of that sort of advantage.<a href="https://www.theregister.co.uk/2018/01/03/nvidia_server_gpus/" rel="nofollow">https://www.theregister.co.uk/2018/01/03/nvidia_server_gpus/</a>

评论 #16892092 未加载

评论 #16892968 未加载

deepnotderp大约 7 年前

Do people think that nobody at nVidia has ever heard of specialized deep learning processors?1. Volta GPUs already have little matmul cores, basically a bunch of little TPUs.2. The graphics dedicated silicon is an extremely tiny portion of the die, a trivial component (source: Bill Dally, nVidia chief scientist).3. Memory access power and performance is the bottleneck (even in the TPU paper), and will only continue to get worse.

评论 #16892140 未加载

评论 #16892176 未加载

评论 #16892180 未加载

评论 #16892161 未加载

评论 #16892346 未加载

alienreborn大约 7 年前

Non paywall link: <a href="https://outline.com/FucjTm" rel="nofollow">https://outline.com/FucjTm</a>

etaioinshrdlu大约 7 年前

It would be interesting to try emulate a many-core CPU as a GPU program and then run an OS on it.This sounds like a dumb idea, and it probably is. But consider a few things:* NVIDIA GPUs have exceptional memory bandwidth, and memory can be a slow resource on CPU based systems (perhaps limited by latency more than bandwidth)* The clock speed isn't that slow, it's in the GHz. Still one's clocks per emulated instruction may not be great.* You can still do pipelining, maybe enough to get the clocks-per-instruction down.* Branch prediction can be done with ample resources. RNN based predictors are a shoe-in.* communication between "cores" should be fast* a many-core emulated CPU might not do too bad for some workloads.* It would have good SIMD support.Food for thought.

评论 #16891946 未加载

评论 #16892107 未加载

评论 #16891924 未加载

BooneJS大约 7 年前

Pretty soft article. General purpose processors no longer have the performance or energy efficiency that’s possible at scale. Further, if you have a choice to control your own destiny, why wouldn’t you choose to?

评论 #16902906 未加载

bogomipz大约 7 年前

The article states:>"LeCun and other scholars of machine learning know that if you were starting with a blank sheet of paper, an Nvidia GPU would not be the ideal chip to build. Because of the way machine-learning algorithms work, they are bumping up against limitations in the way a GPU is designed. GPUs can actually degrade the machine learning’s neural network, LeCun observed.“The solution is a different architecture, one more specialized for neural networks,” said LeCun."Could someone explain to me what exactly are the limitations of current GPGUs such as those sold by Nvidia when used in machine learning/AI contexts? Are these limitation only experienced at scale? Ff someone has resources or links they could share regarding these limitations and better designs I would greatly appreciate it.

评论 #16892493 未加载

emcq大约 7 年前

There is certainly a lot of hype around AI chips, but I'm very skeptical of the reward. There are several technical concerns I have with any "AI" chip that ultimately leave you with something more general purpose (and not really an "AI" chip, but good at low precision matmul):* For inference, how do you efficiently move your data to the chip? In general most of the time is spent in matmul, and there are lots of exciting DSPs, mobile GPUs, etc. that require a fair amount of jumping through hoops to get your data to the ML coprocessor. If you're doing anything low latency, good luck because you need tight control (or bypassing entirely) of the OS. Will this lead to a battle between chip makers? Seems more likely to be a battle between end to end platforms.* For training, do you have an efficient data flow with distributed compute? For the foreseeable future any large model (or small model with lots of data) needs to be distributed. The bottlenecks that come from this limit the improvements from your new specialized architecture without good distributed computing. Again better chips don't really solve this, and comes from a platform. I've noticed many training loops have terrible GPU utilization, particularly with Tensorflow and V100s. Why does this happen? The GPU is so fast, but things like summary ops add to CPU time limiting perf. Bad data pipelines not actually pipelining transformations. Slow disks bottlenecking transfers. Not staging/pipelining transfers to the GPU. And then there is a bit of an open question of how to best pipeline transfers from the GPU. Is there a simulator feeding data? Then you have a whole new can of worms to train fast.* For your chip architecture, do you have the right abstractions to train the next architecture efficiently? Backprop trains some wonderful nets but for the cost of a new chip (50-100M), and the time it takes to build (18 months min), how confident are you that the chip will still be relevant to the needs of your teams? This generally points you towards something more general purpose, which may leave some efficiency on the table. Eventually you end up at a low precision matmul core, which is the same thing everyone is moving towards or already doing whether you call yourself a GPU, DSP, or TPU (which is quite similar to DSPs).Coming from an HPC/Graphics turned deep learning engineer, I've worked with gpus since 2006 and neural net chips since 2010 (before even AlexNet!!), so I'm a bit of an outlier here having seen so many perspectives. From my point of view the computational fabric exists we're just not using it well :)

评论 #16896605 未加载

评论 #16892330 未加载

MR4D大约 7 年前

Back in the day, there was a 386. And also a 387 coprocessor to have the tougher math bits.Then came a 486 and it got integrated again.But during that time, the GPU split off. Companies like ATI and S3 began to dominate, and anyone wanting a computer with decent graphics had one of these chips in their computer.Fast forward several years, and Intel again would bring specialized circuitry back into their main chips, although this time for video.Now we are just seeing the same thing again, but this time it’s an offshoot of the GPU instead of the CPU. Seems like the early 1990’s again, but the acronyms are different.Should be fun to watch.

davidhakendel大约 7 年前

Does anyone have a non-paywall but legitimate link to the story?

评论 #16892028 未加载

Barjak大约 7 年前

If I were better credentialed, I would definitely be looking to get into semiconductors right now. It's an exciting time in terms of manufacturing processes, and I think some of the most interesting and meaningful optimization problems ever formulated come from semiconductor design and manufacturing, not to mention the growing popularity of specialized hardware.I would tell a younger version of myself to focus your education on some aspect of the semiconductors industry.

jacksmith21006大约 7 年前

The new Google Speech solution is the perfect example on why Google had to do their own silicon.Doing speech with 16k samples a second through a NN and keep at a reasonable cost is really, really difficult.The old way was far more power efficient and if you are going to use this new technique which gets you a far better result and do it at a reasonable cost you have to go all the way down into the silicon.Here listen to the results.<a href="https://cloudplatform.googleblog.com/2018/03/introducing-Cloud-Text-to-Speech-powered-by-Deepmind-WaveNet-technology.html" rel="nofollow">https://cloudplatform.googleblog.com/2018/03/introducing-Clo...</a>Now I am curious on the cost difference Google as able to achieve. It is still going to be more then the old way but how close did Google come?But my favorite new thing with these chips is the Jeff Dean paper.<a href="https://www.arxiv-vanity.com/papers/1712.01208v1/" rel="nofollow">https://www.arxiv-vanity.com/papers/1712.01208v1/</a>Can't wait to see the cost difference using Google TPUs and this technique versus traditional approaches.Plus this approach support multi-core inherently. How would you ever do a tree search with multiple cores?Ultimately to get the new applications we need Google and others doing the silicon. We are getting to extremes where the entire stack has to be tuned together.I think Google vision for Lens is going to be a similar situation.

评论 #16893203 未加载

jacksmith21006大约 7 年前

The dynamics of the chip industry have completely changed. Use to be a chip company like Intel sold their chips to a company like Dell that then sold the server with the chip to a business which ran the chip and paid the electric bill.So the company that made the chip had no skin in the game with running the chip or the cost of the electricity to run it.Today we have massive cloud with Google and Amazon and lowering the cost of running their operations goes a long way unlike the days of the past.This is why we will see more and more companies like Google create their own silicon which has already started and well on it's way.Not only the TPUs but Google has created their own network processors as they quietly hired away the Lanai team years ago.<a href="https://www.informationweek.com/data-centers/google-runs-custom-networking-chips/d/d-id/1324285" rel="nofollow">https://www.informationweek.com/data-centers/google-runs-cus...</a>?Also this article helps explain why Google built the TPUs.<a href="https://www.wired.com/2017/04/building-ai-chip-saved-google-building-dozen-new-data-centers/" rel="nofollow">https://www.wired.com/2017/04/building-ai-chip-saved-google-...</a>

willvarfar大约 7 年前

I just seem to bump into a paywall.The premise from the title seems plausible, although NVIDIA seems to be catching up again fast.

评论 #16891702 未加载

评论 #16891703 未加载

mtgx大约 7 年前

Alphabet has already made its AI chip...its second generation already.

评论 #16892699 未加载

16 条评论

oneshot908大约 7 年前

评论 #16893404 未加载

评论 #16893047 未加载

评论 #16892847 未加载

评论 #16892135 未加载

Nokinside大约 7 年前

评论 #16892261 未加载

评论 #16892991 未加载

评论 #16892204 未加载

joe_the_user大约 7 年前

评论 #16892092 未加载

评论 #16892968 未加载

deepnotderp大约 7 年前

评论 #16892140 未加载

评论 #16892176 未加载

评论 #16892180 未加载

评论 #16892161 未加载

评论 #16892346 未加载

alienreborn大约 7 年前

Non paywall link: <a href="https://outline.com/FucjTm" rel="nofollow">https://outline.com/FucjTm</a>

etaioinshrdlu大约 7 年前

评论 #16891946 未加载

评论 #16892107 未加载

评论 #16891924 未加载

BooneJS大约 7 年前

评论 #16902906 未加载

bogomipz大约 7 年前

评论 #16892493 未加载

emcq大约 7 年前

评论 #16896605 未加载

评论 #16892330 未加载

MR4D大约 7 年前

davidhakendel大约 7 年前

Does anyone have a non-paywall but legitimate link to the story?

评论 #16892028 未加载

Barjak大约 7 年前

jacksmith21006大约 7 年前

评论 #16893203 未加载

jacksmith21006大约 7 年前

willvarfar大约 7 年前

I just seem to bump into a paywall.The premise from the title seems plausible, although NVIDIA seems to be catching up again fast.