This is an excellent tool to realize how an LLM actually works from the ground up!<p>For those reading it and going through each step, if by chance you get stuck on why 48 elements are in the first array, please refer to the model.py on minGPT [1]<p>It's an architectural decision that it will be great to mention in the article since people without too much context might lose it<p>[1] <a href="https://github.com/karpathy/minGPT/blob/master/mingpt/model.py">https://github.com/karpathy/minGPT/blob/master/mingpt/model....</a>
The visualization I've been looking for for months. I would have happily paid serious money for this... the fact that it's free is such a gift and I don't take it for granted.
My jaw drop to see algorhythmic complexity laid out so clearly in a 3d space like that. I wish I was smart enough to know if it's accurate or not.
Could as well be titled 'dissecting magic into matmuls and dot products for dummies'. Great stuff. Went away even more amazed that LLMs work as well as they do.
I am looking at Brenden’s GitHub repo <a href="https://github.com/bbycroft/llm-viz">https://github.com/bbycroft/llm-viz</a><p>Really nice stuff.
Twitter thread by the author sharing some extra context on this work: <a href="https://twitter.com/BrendanBycroft/status/1731042957149827140" rel="nofollow noreferrer">https://twitter.com/BrendanBycroft/status/173104295714982714...</a>
Another visualization I would really love would be a clickable circular set of possible prediction branches, projected onto a Poincare disk (to handle the exponential branching component of it all). Would take forever to calculate except on smaller models, but being able to visualize branch probabilities angularly for the top n values or whatever, and to go forwards and backwards up and down different branches would likely yield some important insights into how they work.<p>Good visualization precludes good discoveries in many branches of science, I think.<p>(see my profile for a longer, potentially more silly description ;) )
I big kudos to the author of this.<p>Not only has the visualization, but it's interactive, has explanations for each item, has excellent performance and is open source: <a href="https://github.com/bbycroft/llm-viz/blob/main/src/llm">https://github.com/bbycroft/llm-viz/blob/main/src/llm</a><p>Another interesting visualization related thing: <a href="https://github.com/shap/shap">https://github.com/shap/shap</a>
This is really awesome but I at least wish there were a few added sentences around how I'm supposed to intuitively think about the purpose of why it's like that. For example, I see a T x C matrix of 6 x 48... but at this step, before it's fed into the net, what is this supposed to represent?
A lot of transformer explanations fail to mention what makes self attention so powerful.<p>Unlike traditional neural networks with fixed weights, self-attention layers adaptively weight connections between inputs based on context. This allows transformers to accomplish in a single layer what would take traditional networks multiple layers.
If folks want a lower dimensional version of this for their own models, I'm a big fan of the Netron library for model architecture visualization.<p>Wrote about it here: <a href="https://about.xethub.com/blog/visualizing-ml-models-github-netron" rel="nofollow noreferrer">https://about.xethub.com/blog/visualizing-ml-models-github-n...</a>
This looks pretty cool! Anyone know of visualizations for simpler neural networks? I'm aware of tensorflow playground but that's just for a toy example, is there anything for visualizing a real example (e.g handwriting recognition)?
Rather than looking at the visuals of this network, it is more better to focus on the actual problem with these LLMs which the author already has shown:<p>With in the transformer section:<p>> As is common in deep learning, it's hard to say exactly what each of these layers is doing, but we have some general ideas: the earlier layers tend to focus on learning lower-level features and patterns, while the later layers learn to recognize and understand higher-level abstractions and relationships.<p>That is the problem and yet these black boxes are just as explainable as a magic scroll.
This is a great visualization because original paper on transformers is not very clear and understandable; I tried to read it first and didn't understand so I had to look for other explanations (for example it was unclear for me how multiple tokens are handled).<p>Also, speaking about transformers: they usually append their output tokens to input and process them again. Can we optimize it, so that we don't need to do the same calculations with same input tokens?
This is a phenomenal visualisation. I wish I saw this when I was trying to wrap my head around transformers a while ago. This would have made it so much easier.
Am I the only one getting "Application error: a client-side exception has occurred (see the browser console for more information)." messages?
This shows how the individual weights and vectors work but unless I’m missing something doesn’t quite illustrate yet how higher order vectors are created at the sentence and paragraph level. This might be an emergent property within this system though so it’s hard to “illustrate”. how all of this ends up with a world simulation needs to be understood better and I hope this advances further.
I've wondered for a while if as LLM usage matures, there will be an effort to optimize hotspots like what happened with VMs, or auto indexed like in relational DBs. I'm sure there are common data paths which get more usage, which could somehow be prioritized, either through pre-processing or dynamically, helping speed up inference.
This does an amazing job of showing the difference in complexity between the different models. Click on GPT-3 and you should be able to see all 4 models side-by-side. GPT-3 is a monster compared to nano-gpt.
Very cool. The explanations of what each part is doing is really insightful.
And I especially like how the scale jumps when you move from e.g. Nano all the way to GPT-3 ....
Honestly reading the pytorch implementation of minGTP is a lot more informative than an inscrutable 3d rendering. It's a well commended and pedagogical implementation. I applaud the intention, and it looks slick, but I'm not sure it really conveys information in an efficient way.
I feel like visualizations like this are what is missing from univeristy curricula. Now imagine a professor going trough each animation describing exactly what is happening, I am pretty sure students would get a much more in-depth understanding!