Google supercharges machine learning tasks with TPU custom chip

823 点作者 hurrycane大约 9 年前

45 条评论

luu大约 9 年前

I'm happy to hear that this is finally public so I can actually talk about the work I did when I was at Google :-).I'm a bit surprised they announced this, though. When I was there, there was this pervasive attitude that if "we" had some kind of advantage over the outside world, we shouldn't talk about it lest other people get the same idea. To be clear, I think that's pretty bad for the world and I really wished that they'd change, but it was the prevailing attitude. Currently, if you look at what's being hyped up at a couple of large companies that could conceivably build a competing chip, it's all FPGAs all the time, so announcing that we built an ASIC could change what other companies do, which is exactly what Google was trying to avoid back when I was there.If this signals that Google is going to be less secretive about infrastructure, that's great news.When I joined Microsoft, I tried to gently bring up the possibility of doing either GPUs or ASICs and was told, very confidentially by multiple people, that it's impossible to deploy GPUs at scale, let alone ASICs. Since I couldn't point to actual work I'd done elsewhere, it seemed impossible to convince folks, and my job was in another area, I gave up on it, but I imagine someone is having that discussion again right now.Just as an aside, I'm being fast and loose with language when I use the word impossible. It's more than my feeling is that you have a limited number of influence points and I was spending mine on things like convincing my team to use version control instead of mailing zip files around.

评论 #11725642 未加载

评论 #11726123 未加载

评论 #11728095 未加载

评论 #11726548 未加载

评论 #11726193 未加载

评论 #11730632 未加载

评论 #11726276 未加载

评论 #11727232 未加载

评论 #11725958 未加载

评论 #11726712 未加载

评论 #11725651 未加载

评论 #11728025 未加载

评论 #11729148 未加载

评论 #11730367 未加载

评论 #11726119 未加载

bd大约 9 年前

So now open sourcing of "crown jewels" AI software makes sense.Competitive advantage is protected by custom hardware (and huge proprietary datasets).Everything else can be shared. In fact it is now advantageous to share as much as you can, the bottleneck is a number of people who know how to use new tech.

评论 #11725535 未加载

评论 #11726875 未加载

评论 #11725385 未加载

abritishguy大约 9 年前

I think this shows a fundamental difference between Amazon (AWS) and Google Cloud.AWSs offerings seem fairly vanilla and boring. Google are offering more and more really useful stuff:- cloud machine learning- custom hardware- live migration of hosts without downtime- Cold storage with access in seconds- bigquery- dataflow

评论 #11725175 未加载

评论 #11725832 未加载

评论 #11725163 未加载

评论 #11725145 未加载

评论 #11725208 未加载

评论 #11725164 未加载

manav大约 9 年前

Interesting. Plenty of work has been done with FPGAs, and a few have developed ASICs like DaDianNao in China [1]. Google though actually has the resources to deploy them in their datacenters.Microsoft explored something similar to accelerate search with FPGAs [2]. The results show that the Arria 10 (20nm latest from Altera) had about 1/4th the processing ability at 10% of the power usage of the Nvidia Tesla K40 (25w vs 235w). Nvidia Pascal has something like 2/3x the performance with a similar power profile. That really bridges the gap for performance/watt. All of that also doesn't take into account the ease of working with CUDA versus the complicated development, toolchains, and cost of FPGAs.However, the ~50x+ efficiency increase of an ASIC though could be worthwhile in the long run. The only problem I see is that there might be limitations on model size because of the limited embedded memory of the ASIC.Does anyone have more information or a whitepaper? I wonder if they are using eAsic.[1]: <a href="http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=7011421&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D7011421" rel="nofollow">http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=701142...</a>[2]: <a href="http://research.microsoft.com/pubs/240715/CNN%20Whitepaper.pdf" rel="nofollow">http://research.microsoft.com/pubs/240715/CNN%20Whitepaper.p...</a>

评论 #11726974 未加载

评论 #11729625 未加载

semisight大约 9 年前

This is huge. If they really do offer such a perf/watt advantage, they're serious trouble for NVIDIA. Google is one of only a handful of companies with the upfront cash to make a move like this.I hope we can at least see some white papers soon about the architecture--I wonder how programmable it is.

评论 #11725053 未加载

评论 #11725002 未加载

评论 #11725115 未加载

评论 #11725367 未加载

评论 #11725122 未加载

评论 #11725060 未加载

mrpippy大约 9 年前

Bah, SGI made a Tensor Processing Unit XIO card 15 years ago.evidence suggests they were mostly for defense customers:<a href="http://forums.nekochan.net/viewtopic.php?t=16728751" rel="nofollow">http://forums.nekochan.net/viewtopic.php?t=16728751</a> <a href="http://manx.classiccmp.org/mirror/techpubs.sgi.com/library/manuals/4000/007-4222-002/pdf/007-4222-002.pdf" rel="nofollow">http://manx.classiccmp.org/mirror/techpubs.sgi.com/library/m...</a>

评论 #11727040 未加载

评论 #11727014 未加载

评论 #11728104 未加载

jhartmann大约 9 年前

3 generations ahead of moore law??? I really wonder how they are accomplishing this beyond implementing the kernels in hardware. I suspect they are using specialized memory and an extremely wide architecture.Sounds they also used this for AlphaGo. I wonder how badly we were off on AlphaGo's power estimates. Seems everyone assumed they were using GPU's, sounds like they were not. At least partially. I would really LOVE for them to market these for general use.

评论 #11725788 未加载

评论 #11725253 未加载

评论 #11725071 未加载

评论 #11725104 未加载

评论 #11725634 未加载

asimuvPR大约 9 年前

Now this is really interesting. I've been asking myself why this hadn't happened before. Its been all software, software, software for the last decade or so. But now I get it. We are at a point in time where it makes sense to adjust the hardware to the software. Funny how things work. It used to be the other way around.

评论 #11725640 未加载

breatheoften大约 9 年前

A podcast I listen to posted an interview with an expert last week saying that he perceived that much of the interest in custom hardware for machine learning tasks died when people realized how effective GPUs were at the (still-evolving-set-of) tasks.<a href="http://www.thetalkingmachines.com/blog/2016/5/5/sparse-coding-and-madbits" rel="nofollow">http://www.thetalkingmachines.com/blog/2016/5/5/sparse-codin...</a>I wonder how general the gains from these ASIC's are and whether the performance/power efficiency wins will keep up with the pace of software/algorithm-du-jour advancements.

评论 #11727967 未加载

RIMR大约 9 年前

Somewhat off topic, but if you look at the lower-left hand corner of the heatsink in the first image, there's two red lines and some sort of image artifact.<a href="https://2.bp.blogspot.com/-z1ynWkQlBc8/VzzPToH362I/AAAAAAAACp0/2QBREGUEikoHrML1nh9h3SEKQVzm8NV7QCLcB/s1600/tpu-2.png" rel="nofollow">https://2.bp.blogspot.com/-z1ynWkQlBc8/VzzPToH362I/AAAAAAAAC...</a>They probably didn't mean to use this version of the image for their blog - but I wonder what they were trying to indicate/measure there.

评论 #11730605 未加载

评论 #11726859 未加载

danielvf大约 9 年前

For the curious, that's a plaque on the side if the rack showing the Go board at the end of AlphaGo vs Lee Sedol Game 3, at the moment Lee Sedol resigned and AlphaGo won the tournament (of five games).

评论 #11728533 未加载

nkw大约 9 年前

I guess this explains why Google Cloud Compute hasn't offered GPU instances.

评论 #11732603 未加载

fiatmoney大约 9 年前

I'm guessing that the performance / watt claims are heavily predicated on relatively low throughput, kind of similar to ARM vs Intel CPUs - particularly because they're only powering it & supplying bandwidth via what looks like a 1X PCIE slot.IOW, taking their claims at face value, a Nvidia card or Xeon Phi would be expected to smoke one of these, although you might be able to run N of these in the same power envelope.But those bandwidth & throughput / card limitations would make certain classes of algorithms not really worthwhile to run on these.

评论 #11727645 未加载

评论 #11725982 未加载

评论 #11726731 未加载

bravo22大约 9 年前

Given the insane mask costs for lower geometries, the ASIC is most likely an Xilinx EasyPath or Altera Hardcopy. Otherwise the amortization of the mask and dev costs -- even for a structured cell ASIC -- over 1K unit wouldn't make much sense versus the extra cooling/power costs for a GPU.

评论 #11728156 未加载

评论 #11730368 未加载

Coding_Cat大约 9 年前

I wonder if we will be seeing more of this in the (near) future. I expect so, and from more people then just Google. Why? Look at the problems the fab labs have had with the latest generation of chips and as they grow smaller the problems will probably rise. We are already close to the physical limit of transistor size. So, it is fair to assume that Moore's law will (hopefully) not outlive me.So what then? I certainly hope the tech sector will not just leave it at that. If you want to continue to improve performance (per-watt) there is only one way you can go then: improve the design at an ASIC level. ASIC design will probably stay relatively hard, although there will probably be some technological solutions to make it easier with time, but if fabrication stalls at a certain nm level, production costs will probably start to drop with time as well.I've been thinking about this quite a bit recently because I hope to start my PhD in ~1 year, and I'm torn between HPC or Computer Architecture. This seems to be quite a pro for Comp. Arch ;).

phsilva大约 9 年前

I wonder if this architecture is the same Lanai architecture that was recently introduced by Google on LLVM. <a href="http://lists.llvm.org/pipermail/llvm-dev/2016-February/095118.html" rel="nofollow">http://lists.llvm.org/pipermail/llvm-dev/2016-February/09511...</a>

评论 #11728408 未加载

taliesinb大约 9 年前

I don't know much about this sort of thing but I wonder if the ultimate performance would come with co-locating specialized compute with memory, so that the spatial layout of the computation on silicon ends up mirroring the abstract dataflow dag, with fairly low-bandwidth and energy efficient links between static register arrays that represent individual weight and grad tensors. Minimize the need for caches and power hungry high bandwidth lanes, ideally the only data moving around is your minibatch data going one way and your grads going the other way.I wonder if they're doing that, and to what degree.

harigov大约 9 年前

How is this different from - say - synthetic neurons that IBM is working on, or what nvidia is building?

评论 #11725119 未加载

评论 #11725339 未加载

评论 #11725127 未加载

Bromskloss大约 9 年前

What is the capabilities that a piece of hardware like this needs to have to be suitable for machine learning (and not just one specific machine learning problem)?

评论 #11725262 未加载

评论 #11734245 未加载

nathan_f77大约 9 年前

I'm thinking that this has the potential to change the context of many debates about the "technological singularity", or AI taking over the world. Because it all seems to be based on FUD.While reading this article, one of my first reactions was "holy shit, Google might actually build a general AI with these, and they've probably already been working on it for years".But really, nothing about these chips is unknown or scary. They use algorithms that are carefully engineered and understood. They can be scaled up horizontally to crunch numbers, and they have a very specific purpose. They improve search results and maps.What I'm trying to say is that general artificial intelligence is such a lofty goal, that we're going to have to understand every single piece of the puzzle before we get anywhere close. Including building custom ASICs, and writing all of the software by hand. We're not going to accidentally leave any loopholes open where AI secretly becomes conscious and decided to take over the world.

评论 #11729144 未加载

cschmidt大约 9 年前

This seems very similar to the "Fathom Neural Compute Stick" from Movidius:<a href="http://www.movidius.com/solutions/machine-vision-algorithms/machine-learning" rel="nofollow">http://www.movidius.com/solutions/machine-vision-algorithms/...</a>TensorFlow on a chip....

评论 #11725137 未加载

isseu大约 9 年前

Tensor Processing Unit (TPU)Using it for over a year? Wow

revelation大约 9 年前

There is not a single number in this article.Now these heatsinks can be deceiving for boards that are meant to be in a server rack unit with massive fans throwing a hurricane over them, but even then that is not very much power we're looking at there.

hyperopt大约 9 年前

The Cloud Machine Learning service is one that I'm highly anticipating. Setting up arbitrary cloud machines for training models is a mess right now. I think if Google sets it up correctly, it could be a game changer for ML research for the rest of us. Especially if they can undercut AWS's GPU instances on cost per unit of performance through specialized hardware. I don't think the coinciding releases/announcements of TensorFlow, Cloud ML, and now this are an accident. There is something brewing and I think it's going to be big.

评论 #11728626 未加载

saganus大约 9 年前

Is that a Go board stick to the side of the rack?Maybe they play one move every time someone gets to go there to fix something? or could it be just a way of numbering the racks or something eccentric like that?

评论 #11725336 未加载

评论 #11725326 未加载

评论 #11725429 未加载

hristov大约 9 年前

It is interesting that they would make this into an ASIC, provided how notoriously high the development costs for ASICs are. Are those costs coming down? If so life will get very hard for the FPGA makers of the world soon.It would be interesting to see what the economics of this project are. I.e., what are the development costs and costs per chip. Of course it is very doubtful I will ever get to see the economics of this project, it would be interesting.

评论 #11728931 未加载

评论 #11728005 未加载

评论 #11728638 未加载

protomok大约 9 年前

I'd be interested to know more technical details. I wonder if they're using 8-bit multipliers, how many MACs running in parallel, power consumption, etc.

j-dr大约 9 年前

This is great, but can google stop putting tensor in the name of everything when nothing they do really has anything to do with tensors?

评论 #11728727 未加载

__jal大约 9 年前

My favorite part is what looks like flush-head sheet metal screws holding the heat sink on.No wondering where you left the Torx drivers with this one.

j1vms大约 9 年前

I wouldn't be surprised if Google is looking to build (or done so already) a highly dense and parallel analog computer with limited precision ADC/DACs. I mean that's simplifying things quite a bit, but it would probably map pretty well to the Tensorflow application.

评论 #11726053 未加载

评论 #11725840 未加载

aaronsnoswell大约 9 年前

I'm curious to know; is this announcement something that an expert in these sorts of areas could have (or did?) predict months or years ago, given Google's recent jumps forwards in Machine learning products? Can someone with more knowledge about this comment?

评论 #11727634 未加载

评论 #11727661 未加载

eggy大约 9 年前

Pretty quick implementation.On the energy savings and space savings front, this type of implementation coupled with the space-saving, energy-saving claims of going to unums vs. float should get it to the next order of magnitude. Come on, Google, make unums happen!

paulsutter大约 9 年前

> Our goal is to lead the industry on machine learning and make that innovation available to our customers.Are they saying Google Cloud customers will get access to TPUs eventually? Or that general users will see service improvements?

nxzero大约 9 年前

Is there anyway to detect what hardware to being used by the cloud service if you're using the cloud service? (yes, realize this question is a bit of a paradox, but figured I'd ask.)

mistobaan大约 9 年前

Another point is that they will be able to provide much higher computing capabilities at a much lower price point that any competitors. I really like the direction that the company is taking.

swalsh大约 9 年前

I wonder if opening this up as a cloud offering is a way to get a whole bunch of excess capacity (if it needs it for something big?) but have it paid for.

dharma1大约 9 年前

hasn't made a dent on Nvidia's share price yet

amelius大约 9 年前

One question: what has this got to do with tensors?

评论 #11728414 未加载

评论 #11734277 未加载

eggy大约 9 年前

I think the confluence of new technologies, and the re-emergence / rediscovery of older technologies is going to be the best combination. Whether it goes that way is not certain, since the best technology doesn't always win out. Here, though, the money should, since all would greatly reduce time and energy in mining and validating:* Vector processing computers - not von Neumann machines [1].* Array languages new, or like J, K, or Q in the APL family [2,3]* The replacement of floating point units with unum processors [4]Neural networks are inherently arrays or matrices, and would do better on a designed vector array machine, not a re-purposed GPU, or even a TPU in the article in a standard von Neumann machine. Maybe non-von Neumann architectire like the old Lisp Machines, but for arrays, not lists (and no, this is not a modern GPU. The data has to stay on the processor, not offloaded to external memory).I started with neural networks in late 80s early 1990s, and I was mainly programming in C. matrices and FOR loops. I found J, the array language many years later, unfortunately. Businesses have been making enough money off of the advantage of the array processing language A+, then K, that the per-seat cost of KDB+/Q (database/language) is easily justifiable. Other software like RiakTS are looking to get in the game using Spark/shark and other pieces of kit, but a K4 query is 230 times faster than Spark/shark, and uses 0.2GB of memory vs. 50GB. The similar technologies just don't fit the problem space as good as a vector language. I am partial to J being a more mathematically pure array language in that it is based on arrays. K4 (soon to be K5/K6) is list-based at the lower level, and is honed for tick-data or time series data. J is a bit more general purpose or academic in my opinion.Unums are theoretically more energy efficient and compact than floating point, and take away the error-guessing game. They are being tested with several different language implementations to validate their creator's claims, and practicality. The Mathematica notebook that John Gustafson modeled his work on is available free to download from the book publisher's site. People have already done some type of explorator investigations in Python, Julia and even J already. I believe the J one is a 4-bit implementation of enums based on unums 1.0. John Gustafson just presented unums 2.0 in February 2016.[1] <a href="http://conceptualorigami.blogspot.co.id/2010/12/vector-processing-languages-future-of.html" rel="nofollow">http://conceptualorigami.blogspot.co.id/2010/12/vector-proce...</a>[2] jsoftware.com[3] <a href="http://kxcommunity.com/an-introduction-to-neural-networks-with-kdb.php" rel="nofollow">http://kxcommunity.com/an-introduction-to-neural-networks-wi...</a>[4] <a href="https://www.crcpress.com/The-End-of-Error-Unum-Computing/Gustafson/p/book/9781482239867" rel="nofollow">https://www.crcpress.com/The-End-of-Error-Unum-Computing/Gus...</a>

评论 #11731593 未加载

camkego大约 9 年前

Does anyone have links to the talk or the graphs?

评论 #11725556 未加载

ungzd大约 9 年前

Does it use approximate computing technology?

niels_olson大约 9 年前

I like that the images are mislabeled :)

LogicFailsMe大约 9 年前

Perf/W, the official metric of slow but efficient processors. How many times must we go down this road?Let's see this sucker train AlexNet...

评论 #11726741 未加载

rando3826大约 9 年前

Why use an ANKY in the title? Using an ANKY(Acronym no one knows yet) is bad writing, makes readers feel dumb, etc. Google JUST NOW invented that acronym, sticking it in the title like just another word we should understand is absolutely ridiculous.

评论 #11726647 未加载

simunaga大约 9 年前

In what sense in this a great news? Yes, it's a progress, so what? After all, you - programmers - earn money for your jobs and pretty soon you might not have one. Because of these kinds of great news -- "Whayyy, this is really interesting, AI, maching learning. Aaaaa!"."I'll get fired, won't have money for living and AI will take my place, but the world will be better! Yes! Progress!"Who will benefit from this? Surely not you. Why are you so ecstatic then?

评论 #11728551 未加载

评论 #11734267 未加载

45 条评论

luu大约 9 年前

评论 #11725642 未加载

评论 #11726123 未加载

评论 #11728095 未加载

评论 #11726548 未加载

评论 #11726193 未加载

评论 #11730632 未加载

评论 #11726276 未加载

评论 #11727232 未加载

评论 #11725958 未加载

评论 #11726712 未加载

评论 #11725651 未加载

评论 #11728025 未加载

评论 #11729148 未加载

评论 #11730367 未加载

评论 #11726119 未加载

bd大约 9 年前

评论 #11725535 未加载

评论 #11726875 未加载

评论 #11725385 未加载

abritishguy大约 9 年前

评论 #11725175 未加载

评论 #11725832 未加载

评论 #11725163 未加载

评论 #11725145 未加载

评论 #11725208 未加载

评论 #11725164 未加载

manav大约 9 年前

评论 #11726974 未加载

评论 #11729625 未加载

semisight大约 9 年前

评论 #11725053 未加载

评论 #11725002 未加载

评论 #11725115 未加载

评论 #11725367 未加载

评论 #11725122 未加载

评论 #11725060 未加载

mrpippy大约 9 年前

评论 #11727040 未加载

评论 #11727014 未加载

评论 #11728104 未加载

jhartmann大约 9 年前

评论 #11725788 未加载

评论 #11725253 未加载

评论 #11725071 未加载

评论 #11725104 未加载

评论 #11725634 未加载

asimuvPR大约 9 年前

评论 #11725640 未加载

breatheoften大约 9 年前

评论 #11727967 未加载

RIMR大约 9 年前

评论 #11730605 未加载

评论 #11726859 未加载

danielvf大约 9 年前

评论 #11728533 未加载

nkw大约 9 年前

I guess this explains why Google Cloud Compute hasn't offered GPU instances.

评论 #11732603 未加载

fiatmoney大约 9 年前

评论 #11727645 未加载

评论 #11725982 未加载

评论 #11726731 未加载

bravo22大约 9 年前

评论 #11728156 未加载

评论 #11730368 未加载

Coding_Cat大约 9 年前

phsilva大约 9 年前

评论 #11728408 未加载

taliesinb大约 9 年前

harigov大约 9 年前

How is this different from - say - synthetic neurons that IBM is working on, or what nvidia is building?

评论 #11725119 未加载

评论 #11725339 未加载

评论 #11725127 未加载

Bromskloss大约 9 年前

What is the capabilities that a piece of hardware like this needs to have to be suitable for machine learning (and not just one specific machine learning problem)?

评论 #11725262 未加载

评论 #11734245 未加载

nathan_f77大约 9 年前

评论 #11729144 未加载

cschmidt大约 9 年前

评论 #11725137 未加载

isseu大约 9 年前

Tensor Processing Unit (TPU)Using it for over a year? Wow

revelation大约 9 年前

hyperopt大约 9 年前

评论 #11728626 未加载

saganus大约 9 年前

评论 #11725336 未加载

评论 #11725326 未加载

评论 #11725429 未加载

hristov大约 9 年前

评论 #11728931 未加载

评论 #11728005 未加载

评论 #11728638 未加载

protomok大约 9 年前

I'd be interested to know more technical details. I wonder if they're using 8-bit multipliers, how many MACs running in parallel, power consumption, etc.

j-dr大约 9 年前

This is great, but can google stop putting tensor in the name of everything when nothing they do really has anything to do with tensors?

评论 #11728727 未加载

__jal大约 9 年前

My favorite part is what looks like flush-head sheet metal screws holding the heat sink on.No wondering where you left the Torx drivers with this one.

j1vms大约 9 年前

评论 #11726053 未加载

评论 #11725840 未加载

aaronsnoswell大约 9 年前

评论 #11727634 未加载

评论 #11727661 未加载

eggy大约 9 年前

paulsutter大约 9 年前

nxzero大约 9 年前

Is there anyway to detect what hardware to being used by the cloud service if you're using the cloud service? (yes, realize this question is a bit of a paradox, but figured I'd ask.)

mistobaan大约 9 年前

Another point is that they will be able to provide much higher computing capabilities at a much lower price point that any competitors. I really like the direction that the company is taking.

swalsh大约 9 年前

I wonder if opening this up as a cloud offering is a way to get a whole bunch of excess capacity (if it needs it for something big?) but have it paid for.

dharma1大约 9 年前

hasn't made a dent on Nvidia's share price yet

amelius大约 9 年前

One question: what has this got to do with tensors?

评论 #11728414 未加载

评论 #11734277 未加载

eggy大约 9 年前

评论 #11731593 未加载

camkego大约 9 年前

Does anyone have links to the talk or the graphs?

评论 #11725556 未加载

ungzd大约 9 年前

Does it use approximate computing technology?

niels_olson大约 9 年前

I like that the images are mislabeled :)

LogicFailsMe大约 9 年前

Perf/W, the official metric of slow but efficient processors. How many times must we go down this road?Let's see this sucker train AlexNet...