What I wish someone had told me about tensor computation libraries

302 点作者 _eigenfoo超过 4 年前

16 条评论

dragandj超过 4 年前

Let me chip in with some self-promotion.This book explains and executes every single line of code interactively, from low level operations to high-level networks that do everything automatically. The code is built on the state of the art performance operations of oneDNN (Intel, CPU) and cuDNN (CUDA, GPU). Very concise readable and understandable by humans.<a href="https://aiprobook.com/deep-learning-for-programmers/" rel="nofollow">https://aiprobook.com/deep-learning-for-programmers/</a>Here's the open source library built throughout the book:<a href="https://github.com/uncomplicate/deep-diamond" rel="nofollow">https://github.com/uncomplicate/deep-diamond</a>Some chapters from the beginning of the book are available on my blog, as a tutorial series:<a href="https://dragan.rocks" rel="nofollow">https://dragan.rocks</a>

评论 #25439186 未加载

评论 #25437038 未加载

评论 #25436940 未加载

评论 #25441371 未加载

评论 #25437778 未加载

37ef_ced3超过 4 年前

NN-512 (<a href="https://NN-512.com" rel="nofollow">https://NN-512.com</a>)Generate fully vectorized, stand-alone, human-readable C99 code for neural net inference, and understand exactly what's happening. For example, watch the code run with Linux's perf top and see the relative costs of each layer of the computation. Total transparency, no dependencies outside the C POSIX library

评论 #25436085 未加载

评论 #25437865 未加载

评论 #25439141 未加载

评论 #25435910 未加载

yongjik超过 4 年前

> with dynamically generated graphs, the computational graph is never actually defined anywhere: the computation is traced out on the fly and behind the scene. You can no longer do anything interesting with the computational graph: for example, if the computation is slow, you can’t reason about what parts of the graph are slow.Hmm, my experience is the opposite. When I used Tensorflow, there was no way I could figure out why something is slow, or require huge memory. All I have is a gigantic black box.Meanwhile, in PyTorch, all I have to do is run it with CUDA_LAUNCH_BLOCKING=1, and it will give me an accurate picture of exactly how much milliseconds each line is taking! (Just print the current time before/after the line.) With nvprof it will even tell you which CUDA kernels are executing.* Disclaimer: Haven't dabbled in ML for ~a year, so my view might be outdated now.

评论 #25435632 未加载

jstrong超过 4 年前

I'm a theano diehard, and I'll never get over how google came along, introduced a shittier version of theano, garnered worldwide acclaim for it, and killed the better library in the process.

评论 #25438350 未加载

cmarschner超过 4 年前

Tensorflow 1.0 has its roots in how Theano was built. Same thing, a statically built graph that is run through a compilation step, with a numpy-like API. So what makes Theano such an ingenious concept while TF is regarded as “programming through a keyhole”?

评论 #25436804 未加载

bravura超过 4 年前

I will say that I am very excited by the tftorch.py effort from @sillysaurusx: <a href="https://twitter.com/theshawwn/status/1311925180126511104" rel="nofollow">https://twitter.com/theshawwn/status/1311925180126511104</a>The idea being that pytorch can just be a high-level API executing lower-level tensorflow under the hood.

prideout超过 4 年前

Are these libraries ever useful in non-deep learning applications? It sounds like Theano is a bit more general purpose, but why would I ever need it outside of a deep learning context?I wonder if it could be used for something crazy, e.g. setting up a graph that generates shadertoy-like images on the GPU.

评论 #25438118 未加载

评论 #25438663 未加载

评论 #25438321 未加载

评论 #25440421 未加载

Const-me超过 4 年前

I wonder does any of them have proper Windows support, i.e. DirectCompute?CUDA is NVidia only and vendor lock in is bad for end users. Both CUDA, OpenCL and VK require large runtimes which are not included in the OS, software vendors like me need to redistribute and support it, I tend to avoid deploying libraries when I can.

cygaril超过 4 年前

Seems to have missed the existence of jax.jit, which basically constructs an XLA program (call it a graph if you like) from your Python function which can then be optimized.

评论 #25435580 未加载

评论 #25435404 未加载

PoignardAzur超过 4 年前

Can someone ELI5 what are the differences between the different libraries are? The article uses a lot of jargon, an something that frustrates me about getting into machine learning is that teaching material will either abstract away what the internals do or assume that you already know how the internals work.Some specific questions:> They provide ways of specifying and building computational graphsIs the article talking about neural networks? As in, arrays of arrays of weights, where input values go through successive layers, and for each layer the same instruction is applied to some values with the respective weight?Or is it talking about a graph as in, a functional graph, where manually written functions call other manually written functions? (hence why a later paragraph talks about if-else statements and for loops)> Almost all tensor computation libraries support autodifferentiation in some capacity (either forward-mode, backward-mode, or both).What are those?From the wikipedia article, it sounds like autodifferentiation basically means running f(x+dx)-f(x), but if there are entire frameworks handling it, then there's probably something fancier going on.> According to the JAX quickstart, JAX bills itself as “NumPy on the CPU, GPU, and TPU, with great automatic differentiation for high-performance machine learning research”. Hence, its focus is heavily on autodifferentiation.The earlier description makes it sound like JAX does some cutting-edge compilation stuff to transform semi-arbitrary functions (with ifs and else and loops and stuff) into a function that returns it derivative.So how can that stuff run on the GPU? It sounds like there would be a lot of branching code.And how is that related to machine learning / neural networks?

dangirsh超过 4 年前

Related: The Simple Essence of Automatic Differentiation - Conal Elliot- <a href="https://www.youtube.com/watch?v=ne99laPUxN4" rel="nofollow">https://www.youtube.com/watch?v=ne99laPUxN4</a>- <a href="https://arxiv.org/abs/1804.00746" rel="nofollow">https://arxiv.org/abs/1804.00746</a>

galaxyLogic超过 4 年前

Why are they called TENSOR computation libraries?

albertzeyer超过 4 年前

I was not aware that the PyMC developers have forked and continued Theano: <a href="https://github.com/pymc-devs/Theano-PyMC" rel="nofollow">https://github.com/pymc-devs/Theano-PyMC</a>It seems very active right now.Here some further information: <a href="https://pymc-devs.medium.com/the-future-of-pymc3-or-theano-is-dead-long-live-theano-d8005f8a0e9b" rel="nofollow">https://pymc-devs.medium.com/the-future-of-pymc3-or-theano-i...</a>I haven't really found references to its new name "Aesara".Apparently, the main new feature for Theano will be the JAX backend.I wonder though, my experience when working with Theano, and also deep with the internals (trying to get further graph optimizations on theano.scan):- Some parts of the code are not really clean.- The code is extremely complex and hard to follow. See this: <a href="https://github.com/pymc-devs/Theano-PyMC/blob/master/theano/scan/op.py" rel="nofollow">https://github.com/pymc-devs/Theano-PyMC/blob/master/theano/...</a>- This also made it very complicated to perform optimizations on the graph. See this: <a href="https://github.com/pymc-devs/Theano-PyMC/blob/master/theano/scan/opt.py" rel="nofollow">https://github.com/pymc-devs/Theano-PyMC/blob/master/theano/...</a>- In this specific case, it's also a problem of the API: theano.scan would return the whole sequence. But if you only need the last entry, i.e. y[-1], there is a very complicated optimization rule which checks for that. Basically many optimizations around theano.scan are very complicated because of that.- Here is one attempt for some optimization on theano.scan: <a href="https://github.com/Theano/Theano/pull/3640" rel="nofollow">https://github.com/Theano/Theano/pull/3640</a>- The graph building and esp the graph optimizations are very slow. This is because all the logic is done in pure Python. But if you have big graphs, even just building up the graph can take time, and the optimization passes will take much longer. This was one of the most annoying problems when working with Theano. The startup time to build the graph could easily take up some minutes. I also doubt that you can optimize this very much in pure Python -- I think you would need to reimplement that in C++ or so. When switching to TensorFlow, building the graph felt almost instant in comparison. I wonder if they have any plans on this in this fork.- On the other side, the optimizations on the graph are quite nice. You don't really have to care too much when writing code like log(softmax(z)) -- it will optimize it also to be numerically stable.- The optimizations also went so far to check if some op can work inplace on its input. Which made writing ops more complicated, because if you want to have nice performance, you would write two versions, one which works inplace on the tensor, and another one not. And then again 2 further versions if you want CUDA as well.

评论 #25438534 未加载

评论 #25439483 未加载

MaxBarraclough超过 4 年前

As someone who knows nothing about this area:> I get confused with tensor computation libraries (or computational graph libraries, or symbolic algebra libraries, or whatever they’re marketing themselves as these days).Aren't tensors a sort of generalisation of matrices? How are they equivalent to graphs?

评论 #25440887 未加载

nautilus12超过 4 年前

The last arguments about why you would want a static graph and even it's drawbacks and complaints sound basically similar to why you would want to do functional programming

sidhu1f超过 4 年前

For the heavy lifting of the actual linear algebra computations, these tensor computation libraries typically use some variant of BLAS or eigen.