LLM Visualization

1592 点作者 plibither8超过 1 年前

48 条评论

warkanlock超过 1 年前

This is an excellent tool to realize how an LLM actually works from the ground up!For those reading it and going through each step, if by chance you get stuck on why 48 elements are in the first array, please refer to the model.py on minGPT [1]It's an architectural decision that it will be great to mention in the article since people without too much context might lose it[1] <a href="https://github.com/karpathy/minGPT/blob/master/mingpt/model.py">https://github.com/karpathy/minGPT/blob/master/mingpt/model....</a>

评论 #38513983 未加载

评论 #38512086 未加载

评论 #38508965 未加载

评论 #38514094 未加载

holtkam2超过 1 年前

The visualization I've been looking for for months. I would have happily paid serious money for this... the fact that it's free is such a gift and I don't take it for granted.

评论 #38513539 未加载

wills_forward超过 1 年前

My jaw drop to see algorhythmic complexity laid out so clearly in a 3d space like that. I wish I was smart enough to know if it's accurate or not.

评论 #38509458 未加载

gryfft超过 1 年前

Damn, this looks phenomenal. I've been wanting to do a deep dive like this for a while-- the 3D model is a spectacular pedagogic device.

评论 #38510595 未加载

baq超过 1 年前

Could as well be titled 'dissecting magic into matmuls and dot products for dummies'. Great stuff. Went away even more amazed that LLMs work as well as they do.

mark_l_watson超过 1 年前

I am looking at Brenden’s GitHub repo <a href="https://github.com/bbycroft/llm-viz">https://github.com/bbycroft/llm-viz</a>Really nice stuff.

flockonus超过 1 年前

Twitter thread by the author sharing some extra context on this work: <a href="https://twitter.com/BrendanBycroft/status/1731042957149827140" rel="nofollow noreferrer">https://twitter.com/BrendanBycroft/status/173104295714982714...</a>

评论 #38519242 未加载

tysam_and超过 1 年前

Another visualization I would really love would be a clickable circular set of possible prediction branches, projected onto a Poincare disk (to handle the exponential branching component of it all). Would take forever to calculate except on smaller models, but being able to visualize branch probabilities angularly for the top n values or whatever, and to go forwards and backwards up and down different branches would likely yield some important insights into how they work.Good visualization precludes good discoveries in many branches of science, I think.(see my profile for a longer, potentially more silly description ;) )

29athrowaway超过 1 年前

I big kudos to the author of this.Not only has the visualization, but it's interactive, has explanations for each item, has excellent performance and is open source: <a href="https://github.com/bbycroft/llm-viz/blob/main/src/llm">https://github.com/bbycroft/llm-viz/blob/main/src/llm</a>Another interesting visualization related thing: <a href="https://github.com/shap/shap">https://github.com/shap/shap</a>

8f2ab37a-ed6c超过 1 年前

Expecting someone to implement an LLM in Factorio any day now, we're half-way there already with this blueprint.

Exuma超过 1 年前

This is really awesome but I at least wish there were a few added sentences around how I'm supposed to intuitively think about the purpose of why it's like that. For example, I see a T x C matrix of 6 x 48... but at this step, before it's fed into the net, what is this supposed to represent?

评论 #38520545 未加载

atgctg超过 1 年前

A lot of transformer explanations fail to mention what makes self attention so powerful.Unlike traditional neural networks with fixed weights, self-attention layers adaptively weight connections between inputs based on context. This allows transformers to accomplish in a single layer what would take traditional networks multiple layers.

评论 #38509992 未加载

评论 #38509888 未加载

评论 #38511533 未加载

skadamat超过 1 年前

If folks want a lower dimensional version of this for their own models, I'm a big fan of the Netron library for model architecture visualization.Wrote about it here: <a href="https://about.xethub.com/blog/visualizing-ml-models-github-netron" rel="nofollow noreferrer">https://about.xethub.com/blog/visualizing-ml-models-github-n...</a>

评论 #38510427 未加载

airesQ超过 1 年前

Incredible work.So much depth; initially I thought it's "just" a 3d model. The animations are amazing.

shaburn超过 1 年前

Visualization never seems to get the credit due in software development. This is amazing.

SiempreViernes超过 1 年前

Anyone know if there is a name for this 3D control schema? This feels like one of the most intuitive setups I've ever used.

评论 #38523973 未加载

评论 #38518184 未加载

arikrak超过 1 年前

This looks pretty cool! Anyone know of visualizations for simpler neural networks? I'm aware of tensorflow playground but that's just for a toy example, is there anything for visualizing a real example (e.g handwriting recognition)?

评论 #38510083 未加载

评论 #38510417 未加载

评论 #38519178 未加载

rvz超过 1 年前

Rather than looking at the visuals of this network, it is more better to focus on the actual problem with these LLMs which the author already has shown:With in the transformer section:> As is common in deep learning, it's hard to say exactly what each of these layers is doing, but we have some general ideas: the earlier layers tend to focus on learning lower-level features and patterns, while the later layers learn to recognize and understand higher-level abstractions and relationships.That is the problem and yet these black boxes are just as explainable as a magic scroll.

评论 #38511046 未加载

codedokode超过 1 年前

This is a great visualization because original paper on transformers is not very clear and understandable; I tried to read it first and didn't understand so I had to look for other explanations (for example it was unclear for me how multiple tokens are handled).Also, speaking about transformers: they usually append their output tokens to input and process them again. Can we optimize it, so that we don't need to do the same calculations with same input tokens?

评论 #38517025 未加载

hmate9超过 1 年前

This is a phenomenal visualisation. I wish I saw this when I was trying to wrap my head around transformers a while ago. This would have made it so much easier.

johnklos超过 1 年前

Am I the only one getting "Application error: a client-side exception has occurred (see the browser console for more information)." messages?

评论 #38524012 未加载

评论 #38513231 未加载

评论 #38514019 未加载

tsunamifury超过 1 年前

This shows how the individual weights and vectors work but unless I’m missing something doesn’t quite illustrate yet how higher order vectors are created at the sentence and paragraph level. This might be an emergent property within this system though so it’s hard to “illustrate”. how all of this ends up with a world simulation needs to be understood better and I hope this advances further.

评论 #38508667 未加载

russellbeattie超过 1 年前

I've wondered for a while if as LLM usage matures, there will be an effort to optimize hotspots like what happened with VMs, or auto indexed like in relational DBs. I'm sure there are common data paths which get more usage, which could somehow be prioritized, either through pre-processing or dynamically, helping speed up inference.

nbzso超过 1 年前

Beautiful. This should be the new educational standard for visualization of complex topics and systemic thinking.

abrookewood超过 1 年前

This does an amazing job of showing the difference in complexity between the different models. Click on GPT-3 and you should be able to see all 4 models side-by-side. GPT-3 is a monster compared to nano-gpt.

tikkun超过 1 年前

What happened to this thread? When I saw it before it had 700+ upvotes.

评论 #38511905 未加载

drdg超过 1 年前

Very cool. The explanations of what each part is doing is really insightful. And I especially like how the scale jumps when you move from e.g. Nano all the way to GPT-3 ....

thistoowontpass超过 1 年前

Thank you. I'd just completed doing this manually (much uglier and less accurate) and so can really appreciate the effort behind this.

thefourthchime超过 1 年前

First off, this is fabulous work. I went through it for the Nano, but is there a way to do the step-by-step for the other LLMs?

评论 #38508497 未加载

cod1r超过 1 年前

Really cool stuff. Looks like an entire computer but with software. Definitely need to dig into more AI/ML things.

nikhil896超过 1 年前

This is by far the best resource I've seen to understand LLMs. Incredibly well done! Thanks for this awesome tool

Simon_ORourke超过 1 年前

This is brilliant work, thanks for sharing.

评论 #38510263 未加载

crotchfire超过 1 年前

Application error: a client-side exception has occurred (see the browser console for more information).

评论 #38514635 未加载

sva_超过 1 年前

The score on this post just went down by a factor of 10 and the time went to "1 hour" ago?!

评论 #38512067 未加载

gbertasius超过 1 年前

This is AMAZING! I'm about to go into Uni and this will be useful for my ML classes.

meeby超过 1 年前

This is easily the best visualization I've seen for a long time. Fantastic work!

RecycledEle超过 1 年前

This is excellent!This is why I love Hacker News!

Arctic_fly超过 1 年前

Curse you for being interesting enough to make me get on my desktop.

valdect超过 1 年前

Super cool! It's always nice to look something concrete

athulsuresh123超过 1 年前

This should be in college textbooks

physPop超过 1 年前

Honestly reading the pytorch implementation of minGTP is a lot more informative than an inscrutable 3d rendering. It's a well commended and pedagogical implementation. I applaud the intention, and it looks slick, but I'm not sure it really conveys information in an efficient way.

smy20011超过 1 年前

Really cool!

Solvency超过 1 年前

Wish it were mobile friendly.

Workaccount2超过 1 年前

Such an amazing tool

haltist超过 1 年前

Very cool.

BSTRhino超过 1 年前

bbycroft is the GOAT!

wdiamond超过 1 年前

amazing router

reqo超过 1 年前

I feel like visualizations like this are what is missing from univeristy curricula. Now imagine a professor going trough each animation describing exactly what is happening, I am pretty sure students would get a much more in-depth understanding!

评论 #38508517 未加载

评论 #38509386 未加载

评论 #38509044 未加载