Tinygrad: A simple and powerful neural network framework

434 点作者 masterofsome超过 2 年前

26 条评论

brrrrrm超过 2 年前

> It compiles a custom kernel for every operation, allowing extreme shape specialization.This doesn't matter. Just look at the performance achieved by CuDNN kernels (which back PyTorch), they're dynamically shaped and hit near peak. For dense linear algebra at the size of modern neural networks, optimizing for the loop bound condition won't help much.> All tensors are lazy, so it can aggressively fuse operations.This matters. PyTorch teams are trying to implement that now (they have LazyTensor, AITemplate, TorchDynamo), but I'm not sure of the status (it's been tried repeatedly).> The backend is 10x+ simpler, meaning optimizing one kernel makes everything fast.The first part of that sentence matters, the second part doesn't. Kernels are already fast and their reuse outside of being fused into each other (which you need a full linear algebra compiler to do) isn't very high. If you make sum fast, you have not made matrix multiplication fast even though MM has a sum in it. It just isn't that easy to compose operations and still hit 80+% of hardware efficiency.But it is easier to iterate fast and build a seamless lazy compiler if your backend is simple. You can pattern match more easily and ensure you handle edge cases without insanely complicated things like alias analysis (which PyTorch has to do).

评论 #33468051 未加载

评论 #33468008 未加载

评论 #33466861 未加载

orlp超过 2 年前

<pre><code> > It's extremely simple, and breaks down the most complex networks into 4 OpTypes: > > - UnaryOps operate on one tensor and run elementwise. RELU, LOG, RECIPROCAL, etc... > - BinaryOps operate on two tensors and run elementwise to return one. ADD, MUL, etc... > - ReduceOps operate on one tensor and return a smaller tensor. SUM, MAX > - MovementOps operate on one tensor and move the data around, copy-free with ShapeTracker. RESHAPE, PERMUTE, EXPAND, etc... > > But how...where are your CONVs and MATMULs? Read the code to solve this mystery. </code></pre> Ok, I was curious, so I read the code. The answer is that it represents a MATMUL as a 1x1 CONV. And it lied about CONV, which is a ProcessingOps.CONV and explicitly represented and implemented: <a href="https://github.com/geohot/tinygrad/blob/c0050fab8ff0bc667e40da11980f4ac4c21affda/tinygrad/llops/ops_cpu.py#L40" rel="nofollow">https://github.com/geohot/tinygrad/blob/c0050fab8ff0bc667e40...</a> Quite the letdown of figuring out this 'mystery'.

评论 #33467864 未加载

评论 #33465010 未加载

评论 #33463660 未加载

评论 #33464171 未加载

评论 #33465164 未加载

sakras超过 2 年前

I must say they gained instant credibility with the minimalistic website given how fast it loaded.Code looks simple and easy to follow, and I love how the comments are constantly mentioning hardware characteristics, making maxing the hardware the goal. It seems that it’s trying to achieve this by jitting optimal code for the operations at hand rather than hand-optimizing kernels, and betting that the small number of operations will make tuning the codegen tractable.I haven’t kept up much with what’s happening in ML, but at least in the realm of columnar database engines, interpreting a series of hand-optimized kernels seems to be the dominant approach over compiling a vectorized query plan. Are compilers good enough at optimizing ML operations that specializing on input shape makes a difference over hand-tuned kernels?

评论 #33463069 未加载

JacobiX超过 2 年前

I love those tiny DNN frameworks, some examples that I studied in the past (I still use PyTorch for work related projects) :thinc.by the creators of spaCy <a href="https://github.com/explosion/thinc" rel="nofollow">https://github.com/explosion/thinc</a>nnabla by Sony <a href="https://github.com/sony/nnabla" rel="nofollow">https://github.com/sony/nnabla</a>LibNC by Fabrice Bellard <a href="https://bellard.org/libnc/" rel="nofollow">https://bellard.org/libnc/</a>Dlib dnn <a href="http://dlib.net/ml.html#add_layer" rel="nofollow">http://dlib.net/ml.html#add_layer</a>

评论 #33466938 未加载

emaro超过 2 年前

I love this website. Their style tag literally is:<pre><code> <style> body { font-family:'Lucida Console', monospace } </style> </code></pre> Also look like a very cool project.

评论 #33463246 未加载

therealchiggs超过 2 年前

There's an interesting roadmap in the "cherry" folder of the git repo[0]. It begins by bringing up a design on FPGA and ends with selling the company for $1B+ by building accelerator cards to compete with NVIDIA:<pre><code> Cherry Three (5nm tapeout) ===== * Support DMA over PCI-E 4.0. 32 GB/s * 16 cores * 8M elements in on board RAM of each core (288 MB SRAM on chip) * Shared ~16GB GDDR6 between cores. Something like 512 GB/s * 16x 32x32x32 matmul = 32768 mults * 1 PFLOP @ 1 ghz (finally, a petaflop chip) * Target 300W, power savings from process shrink * This card should be on par with a DGX A100 and sell for $2000 * At this point, we have won. * The core Verilog is open source, all the ASIC speed tricks are not. * Cherry will dominate the market for years to come, and will be in every cloud. * Sell the company for $1B+ to anyone but NVIDIA </code></pre> [0] <a href="https://github.com/geohot/tinygrad/blob/master/accel/cherry/README#L22-L68" rel="nofollow">https://github.com/geohot/tinygrad/blob/master/accel/cherry/...</a>

lr1970超过 2 年前

As it was recently discussed at length here on HN [0] (401 comments), George Hotz (the lead of tinygrad) is taking time off his self-driving startup comma.ai [1]. Curious if this would help or hurt tinygrad progress.[0] <a href="https://news.ycombinator.com/item?id=33406790" rel="nofollow">https://news.ycombinator.com/item?id=33406790</a>[1] <a href="https://comma.ai/" rel="nofollow">https://comma.ai/</a>

fragmede超过 2 年前

Of course, the stable diffusion tie-in is not to be missed!<a href="https://github.com/geohot/tinygrad/blob/master/examples/stable_diffusion.py" rel="nofollow">https://github.com/geohot/tinygrad/blob/master/examples/stab...</a>

eterevsky超过 2 年前

How is it compared to JAX? After TensorFlow and PyTorch, JAX seems very simple, basically an accelerated numpy with just a few additional useful features like automatic differentiation, vectorization and jit-compilation. In terms of API I don't see how you can go any simpler.

评论 #33463809 未加载

评论 #33463717 未加载

tucosan超过 2 年前

Can someone from the ML crowd ELI5 to me what tinygrad does, how it plugs into an ML pipeline and what it's use cases are?

评论 #33470527 未加载

ivalm超过 2 年前

“Almost 9k stars” is actually 7.3k stars…But otherwise very cool project :)

评论 #33468653 未加载

评论 #33487397 未加载

dedoussis超过 2 年前

It's funny that geohot/tinygrad chooses to not meet the PEP8 standards [0] just to stay on brand (<1000 lines). Black [1] or any other python autoformatter would probably 2x the lines of code.[0] <a href="https://peps.python.org/pep-0008/" rel="nofollow">https://peps.python.org/pep-0008/</a>[1] <a href="https://github.com/psf/black" rel="nofollow">https://github.com/psf/black</a>

评论 #33470641 未加载

评论 #33465968 未加载

jamesrom超过 2 年前

tinygrad core is over 1000 loc now[1]. If anyone was looking for a fun weekend project :)<a href="https://github.com/geohot/tinygrad/blob/master/.github/workflows/test.yml#L9" rel="nofollow">https://github.com/geohot/tinygrad/blob/master/.github/workf...</a>

评论 #33468565 未加载

stephc_int13超过 2 年前

I understand that the Python code is mostly driving faster low-level code, but I wonder how much time is effectively wasted by not using a lower-level language.From my experience with game engines, it often turns out to be a bad idea (for performance and maintainability) to mix C/C++ and Lua or C#.

评论 #33465518 未加载

评论 #33465844 未加载

gregjw超过 2 年前

Geohot at it again, this guy nails everything.

评论 #33466906 未加载

alexmolas超过 2 年前

> almost 9000 GitHub starsI wouldn't say that 7500 stars is almost 9000 stars ;)

评论 #33463884 未加载

bfrankline超过 2 年前

If you care exclusively about numerical stability and performance, why _this_ set of operators (e.g., there’re plenty of good reasons to include expm1 or log1p and certainly trigonometric functions)? It’d be an interesting research problem to measure and identify the minimal subset of operators (and I suspect it’d look differently than what you’d expect from an FPU).If you care exclusively about minimalism, why not limit yourself to the Meijer-G function (or some other general-purpose alternative)?

learndeeply超过 2 年前

The code is very easy to read. Doesn't seem like there's data/model parallelism support for training, which will be important for real-world use.

bArray超过 2 年前

How does this compare on embedded systems for performance? For example PyTorch vs tinygrad, or Darknet vs tinygrad?

bullen超过 2 年前

Does anyone know of a neural network that is written in C and GLSL and that runs on normal OpenGL?

lostmsu超过 2 年前

It was ok as an educational tool, but now they don't count GPU implementation in 1000 lines, so it is not small. Considering the code style it is closer to 20k+ lines when formatted and GPU code included.It also doesn't support bfloat16 so is doomed to be 2x slower.

评论 #33468775 未加载

DeathArrow超过 2 年前

I believe neural networks are over hyped sometimes.They are not always the best tool for the job. There are lots of other ML techniques such as SVM, naive Bayes, k-nearest neighbor, decision tree, logistic regression, random forest etc. nobody is using because they lack the hype factor.If something lacks some keywords like neural network, deep learning, reinforced learning, than it is deemed not cool.

评论 #33462639 未加载

评论 #33462751 未加载

评论 #33462637 未加载

评论 #33473449 未加载

评论 #33465201 未加载

评论 #33463490 未加载

RektBoy超过 2 年前

No Bible quotes? I'm disappointed...

评论 #33465321 未加载

matesz超过 2 年前

If anybody is dealing with procrastination watch George Hotz live streaming 10h straight working on this library [1][2]. Does he take some supplements to do this? There is even 19.5h stream [3].Actually I have local obs setup to record myself, just instead of streaming I do recordings for my own inspection. Important part is to do the inspection after. It works wonders.[1] <a href="https://youtu.be/GXy5eVwnL_Q" rel="nofollow">https://youtu.be/GXy5eVwnL_Q</a>[2] <a href="https://m.youtube.com/watch?v=Cb2KwcnDKrk" rel="nofollow">https://m.youtube.com/watch?v=Cb2KwcnDKrk</a>[3] no joke, 19.5h stream <a href="https://www.youtube.com/watch?v=xc0jGZYFQLQ" rel="nofollow">https://www.youtube.com/watch?v=xc0jGZYFQLQ</a>

评论 #33465199 未加载

评论 #33464099 未加载

评论 #33463158 未加载

评论 #33466131 未加载

评论 #33463014 未加载

评论 #33462944 未加载

评论 #33463631 未加载

评论 #33463547 未加载

评论 #33465877 未加载

评论 #33463405 未加载

评论 #33465570 未加载

评论 #33467848 未加载

neets超过 2 年前

4 OpCodes, I think Geohot is taking a cue from his favorite intellectual's Curtis Yavin's Urbit project

kwant_kiddo超过 2 年前

I think posts like this are only getting upvotes because George Hotz owns the project. I do see value in simple code, but the constraint of 1000 LOC makes little sense to me, especially when the code is formatted poorly.This will get downvoted, but reading the comments here I dont understand the (cult/respect) for him. Siding with the most successful CTF-team ever (PPP) he won defcon two times. He made a startup with funding that makes a cool 'niche' product.I just think a guy like Chris Lattner or Dave Cutler who made so much impact on real computing deserve so much more respect, but I guess that the norm here is to admire this guy.

评论 #33470460 未加载