TechEcho

10 comments

cs702over 7 years ago

TPUs are only one part of this eye-opening presentation. Skip to page 28, where Jeff starts talking about:* Using reinforcement learning so the computer can figure out how to parallelize code and models on its own. In experiments, the machine beats human-designed parallelization.* Replacing B-tree indices, hash maps, and Bloom filters with data-driven indices learned by deep learning models. In experiments, the learned indices outperform the usual stalwarts by a large margin in both computing cost and performance, and are auto-tuning.* Using reinforcement learning to manage datacenter power. Machine intelligence outperforms human-designed energy-management policies.* Using machine intelligence to replace user-tunable performance options in all software systems, eliminating the need to tweak them with command line parameters like --num-threads=16, --max-memory-use=104876, etc. Machine intelligence outperforms hand-tuning.* Using machine intelligence for all tasks currently managed with heuristics. For example, in compilers: instruction scheduling, register allocation, loop nest parallelization strategies, etc.; in networking: TCP window size decisions, backoff for retransmits, data compression, etc.; in operating systems: process scheduling, buffer cache insertion/replacement, file system prefetching, etc.; in job scheduling systems: which tasks/VMs to co-locate on same machine, which tasks to pre-empt, etc.; in ASIC design: physical circuit layout, test case selection, etc. Machine intelligence outperforms human heuristics.IN SHORT: machine intelligence (today, that means deep learning and reinforcement learning) is going to penetrate and ultimately control EVERY layer of the software stack, replacing human engineering with auto-tuning, self-improving, better-performing code.Eye-opening.

评论 #15897616 未加载

评论 #15897625 未加载

评论 #15897165 未加载

评论 #15897647 未加载

评论 #15897162 未加载

评论 #15899479 未加载

评论 #15897397 未加载

cobookmanover 7 years ago

Nvidia Titan V can do 110 TFLOPS, 12GB of 1.7 Gb/s Memory [1] and sells for 3,000$. TPU v2 does 180 TFLOPS, 64GB of 19.2Gb/s Memory [2].That's a heck of a performance boost for a chip that's likely costing google way less than the nvidia flagship.[1] <a href="http://www.tomshardware.com/news/nvidia-titan-v-110-teraflops,36085.html" rel="nofollow">http://www.tomshardware.com/news/nvidia-titan-v-110-teraflop...</a>

评论 #15895457 未加载

评论 #15895921 未加载

评论 #15895454 未加载

jamesblondeover 7 years ago

Great talk, with lots of new insights into what's happening at Google. I really think his point that ImageNet is the new Mnist now holds true. Even research labs should be buying DeepLearning11 servers (10 x 1080Ti) for $15k, and training large models in a reasonable amount of time. It may seem that Google are way ahead, but they are just doing synchronous SGD, and it was interesting to see the drop in prediction accuracy from 128 TPU2 cores to 256 TPU2 cores for ImageNet (76 -> 75% accuracy). So, the algorithms for dist. training aren't unknown, and with cheap hardware like the DL11 server, many well-financed research groups can compete with this.

评论 #15896079 未加载

larelliover 7 years ago

It looks like this paper has more information: <a href="https://arxiv.org/pdf/1712.01208v1.pdf" rel="nofollow">https://arxiv.org/pdf/1712.01208v1.pdf</a>

EvgeniyZhover 7 years ago

Was it filmed? If yes, when video will be available?

评论 #15896215 未加载

nickpsecurityover 7 years ago

Great presentation. Far as application, I already thought this might be useful in lightweight, formal methods to spot problems and suggest corrections for failures in Rust's borrow checkers, separation logic on C programs, proof tactics, and static analysis tooling. For Rust example, the person might try to express a solution in the language that fails the borrow checker. If they can't understand why, they submit it to the system that attempts to spot where the problem is. The system might start with humans spotting it and restructuring the code to pass borrow checker. Every instance of those will feed into the learning system that might eventually do that on its own. There's also potential to use automated, equivalence checks/tests between user-submitted code and the AI's suggestions to help human-in-the-loop decide if it's worth review before passing onto the other person.In hardware, both digital and analog designers seem to use lots of heuristics in how they design things. Certainly could help there. Might be especially useful in analog due to small number of experienced engineers available.

yeukhonover 7 years ago

While this is a collective work, honestly, after hearing about JD for so many years: is there anything he CAN’T do?

评论 #15895485 未加载

1024coreover 7 years ago

This is some really cool stuff, I hope this submission gets more upvotes and reaches a wider audience.

novaRomover 7 years ago

I speculate that Google will sell TPUv2 for as less as 500 USD per PCIe card already in 2018. Nvidia's Volta TensorCores are essentially the same: 32-bit accumulators and 16-bit multipliers, but GPUs are more general-purpose which is not necessary for Deep Learning since most intensive operation is dot-product (y+=w*x).

评论 #15898625 未加载

nlover 7 years ago

That "Learned Index Structures" makes it pretty clear that Karpathy was right in his widely criticized "Software 2.0" piece.

评论 #15895901 未加载

评论 #15895653 未加载

评论 #15895463 未加载

评论 #15895323 未加载

10 comments

cs702over 7 years ago

评论 #15897616 未加载

评论 #15897625 未加载

评论 #15897165 未加载

评论 #15897647 未加载

评论 #15897162 未加载

评论 #15899479 未加载

评论 #15897397 未加载

cobookmanover 7 years ago

评论 #15895457 未加载

评论 #15895921 未加载

评论 #15895454 未加载

jamesblondeover 7 years ago

评论 #15896079 未加载

larelliover 7 years ago

It looks like this paper has more information: <a href="https://arxiv.org/pdf/1712.01208v1.pdf" rel="nofollow">https://arxiv.org/pdf/1712.01208v1.pdf</a>

EvgeniyZhover 7 years ago

Was it filmed? If yes, when video will be available?

评论 #15896215 未加载

nickpsecurityover 7 years ago

yeukhonover 7 years ago

While this is a collective work, honestly, after hearing about JD for so many years: is there anything he CAN’T do?

评论 #15895485 未加载

1024coreover 7 years ago

This is some really cool stuff, I hope this submission gets more upvotes and reaches a wider audience.

novaRomover 7 years ago

评论 #15898625 未加载

nlover 7 years ago

That "Learned Index Structures" makes it pretty clear that Karpathy was right in his widely criticized "Software 2.0" piece.

评论 #15895901 未加载

评论 #15895653 未加载

评论 #15895463 未加载

评论 #15895323 未加载

Machine Learning for Systems and Systems for Machine Learning [pdf]

10 comments

Machine Learning for Systems and Systems for Machine Learning [pdf]

10 comments