LLM Visualization

1592 pointsby plibither8over 1 year ago

48 comments

warkanlockover 1 year ago

This is an excellent tool to realize how an LLM actually works from the ground up!For those reading it and going through each step, if by chance you get stuck on why 48 elements are in the first array, please refer to the model.py on minGPT [1]It's an architectural decision that it will be great to mention in the article since people without too much context might lose it[1] <a href="https://github.com/karpathy/minGPT/blob/master/mingpt/model.py">https://github.com/karpathy/minGPT/blob/master/mingpt/model....</a>

评论 #38513983 未加载

评论 #38512086 未加载

评论 #38508965 未加载

评论 #38514094 未加载

holtkam2over 1 year ago

The visualization I've been looking for for months. I would have happily paid serious money for this... the fact that it's free is such a gift and I don't take it for granted.

评论 #38513539 未加载

wills_forwardover 1 year ago

My jaw drop to see algorhythmic complexity laid out so clearly in a 3d space like that. I wish I was smart enough to know if it's accurate or not.

评论 #38509458 未加载

gryfftover 1 year ago

Damn, this looks phenomenal. I've been wanting to do a deep dive like this for a while-- the 3D model is a spectacular pedagogic device.

评论 #38510595 未加载

baqover 1 year ago

Could as well be titled 'dissecting magic into matmuls and dot products for dummies'. Great stuff. Went away even more amazed that LLMs work as well as they do.

mark_l_watsonover 1 year ago

I am looking at Brenden’s GitHub repo <a href="https://github.com/bbycroft/llm-viz">https://github.com/bbycroft/llm-viz</a>Really nice stuff.

flockonusover 1 year ago

Twitter thread by the author sharing some extra context on this work: <a href="https://twitter.com/BrendanBycroft/status/1731042957149827140" rel="nofollow noreferrer">https://twitter.com/BrendanBycroft/status/173104295714982714...</a>

评论 #38519242 未加载

tysam_andover 1 year ago

Another visualization I would really love would be a clickable circular set of possible prediction branches, projected onto a Poincare disk (to handle the exponential branching component of it all). Would take forever to calculate except on smaller models, but being able to visualize branch probabilities angularly for the top n values or whatever, and to go forwards and backwards up and down different branches would likely yield some important insights into how they work.Good visualization precludes good discoveries in many branches of science, I think.(see my profile for a longer, potentially more silly description ;) )

29athrowawayover 1 year ago

I big kudos to the author of this.Not only has the visualization, but it's interactive, has explanations for each item, has excellent performance and is open source: <a href="https://github.com/bbycroft/llm-viz/blob/main/src/llm">https://github.com/bbycroft/llm-viz/blob/main/src/llm</a>Another interesting visualization related thing: <a href="https://github.com/shap/shap">https://github.com/shap/shap</a>

8f2ab37a-ed6cover 1 year ago

Expecting someone to implement an LLM in Factorio any day now, we're half-way there already with this blueprint.

Exumaover 1 year ago

This is really awesome but I at least wish there were a few added sentences around how I'm supposed to intuitively think about the purpose of why it's like that. For example, I see a T x C matrix of 6 x 48... but at this step, before it's fed into the net, what is this supposed to represent?

评论 #38520545 未加载

atgctgover 1 year ago

A lot of transformer explanations fail to mention what makes self attention so powerful.Unlike traditional neural networks with fixed weights, self-attention layers adaptively weight connections between inputs based on context. This allows transformers to accomplish in a single layer what would take traditional networks multiple layers.

评论 #38509992 未加载

评论 #38509888 未加载

评论 #38511533 未加载

skadamatover 1 year ago

If folks want a lower dimensional version of this for their own models, I'm a big fan of the Netron library for model architecture visualization.Wrote about it here: <a href="https://about.xethub.com/blog/visualizing-ml-models-github-netron" rel="nofollow noreferrer">https://about.xethub.com/blog/visualizing-ml-models-github-n...</a>

评论 #38510427 未加载

airesQover 1 year ago

Incredible work.So much depth; initially I thought it's "just" a 3d model. The animations are amazing.

shaburnover 1 year ago

Visualization never seems to get the credit due in software development. This is amazing.

SiempreViernesover 1 year ago

Anyone know if there is a name for this 3D control schema? This feels like one of the most intuitive setups I've ever used.

评论 #38523973 未加载

评论 #38518184 未加载

arikrakover 1 year ago

This looks pretty cool! Anyone know of visualizations for simpler neural networks? I'm aware of tensorflow playground but that's just for a toy example, is there anything for visualizing a real example (e.g handwriting recognition)?

评论 #38510083 未加载

评论 #38510417 未加载

评论 #38519178 未加载

rvzover 1 year ago

Rather than looking at the visuals of this network, it is more better to focus on the actual problem with these LLMs which the author already has shown:With in the transformer section:> As is common in deep learning, it's hard to say exactly what each of these layers is doing, but we have some general ideas: the earlier layers tend to focus on learning lower-level features and patterns, while the later layers learn to recognize and understand higher-level abstractions and relationships.That is the problem and yet these black boxes are just as explainable as a magic scroll.

评论 #38511046 未加载

codedokodeover 1 year ago

This is a great visualization because original paper on transformers is not very clear and understandable; I tried to read it first and didn't understand so I had to look for other explanations (for example it was unclear for me how multiple tokens are handled).Also, speaking about transformers: they usually append their output tokens to input and process them again. Can we optimize it, so that we don't need to do the same calculations with same input tokens?

评论 #38517025 未加载

hmate9over 1 year ago

This is a phenomenal visualisation. I wish I saw this when I was trying to wrap my head around transformers a while ago. This would have made it so much easier.

johnklosover 1 year ago

Am I the only one getting "Application error: a client-side exception has occurred (see the browser console for more information)." messages?

评论 #38524012 未加载

评论 #38513231 未加载

评论 #38514019 未加载

tsunamifuryover 1 year ago

This shows how the individual weights and vectors work but unless I’m missing something doesn’t quite illustrate yet how higher order vectors are created at the sentence and paragraph level. This might be an emergent property within this system though so it’s hard to “illustrate”. how all of this ends up with a world simulation needs to be understood better and I hope this advances further.

评论 #38508667 未加载

russellbeattieover 1 year ago

I've wondered for a while if as LLM usage matures, there will be an effort to optimize hotspots like what happened with VMs, or auto indexed like in relational DBs. I'm sure there are common data paths which get more usage, which could somehow be prioritized, either through pre-processing or dynamically, helping speed up inference.

nbzsoover 1 year ago

Beautiful. This should be the new educational standard for visualization of complex topics and systemic thinking.

abrookewoodover 1 year ago

This does an amazing job of showing the difference in complexity between the different models. Click on GPT-3 and you should be able to see all 4 models side-by-side. GPT-3 is a monster compared to nano-gpt.

tikkunover 1 year ago

What happened to this thread? When I saw it before it had 700+ upvotes.

评论 #38511905 未加载

drdgover 1 year ago

Very cool. The explanations of what each part is doing is really insightful. And I especially like how the scale jumps when you move from e.g. Nano all the way to GPT-3 ....

thistoowontpassover 1 year ago

Thank you. I'd just completed doing this manually (much uglier and less accurate) and so can really appreciate the effort behind this.

thefourthchimeover 1 year ago

First off, this is fabulous work. I went through it for the Nano, but is there a way to do the step-by-step for the other LLMs?

评论 #38508497 未加载

cod1rover 1 year ago

Really cool stuff. Looks like an entire computer but with software. Definitely need to dig into more AI/ML things.

nikhil896over 1 year ago

This is by far the best resource I've seen to understand LLMs. Incredibly well done! Thanks for this awesome tool

Simon_ORourkeover 1 year ago

This is brilliant work, thanks for sharing.

评论 #38510263 未加载

crotchfireover 1 year ago

Application error: a client-side exception has occurred (see the browser console for more information).

评论 #38514635 未加载

sva_over 1 year ago

The score on this post just went down by a factor of 10 and the time went to "1 hour" ago?!

评论 #38512067 未加载

gbertasiusover 1 year ago

This is AMAZING! I'm about to go into Uni and this will be useful for my ML classes.

meebyover 1 year ago

This is easily the best visualization I've seen for a long time. Fantastic work!

RecycledEleover 1 year ago

This is excellent!This is why I love Hacker News!

Arctic_flyover 1 year ago

Curse you for being interesting enough to make me get on my desktop.

valdectover 1 year ago

Super cool! It's always nice to look something concrete

athulsuresh123over 1 year ago

This should be in college textbooks

physPopover 1 year ago

Honestly reading the pytorch implementation of minGTP is a lot more informative than an inscrutable 3d rendering. It's a well commended and pedagogical implementation. I applaud the intention, and it looks slick, but I'm not sure it really conveys information in an efficient way.

smy20011over 1 year ago

Really cool!

Solvencyover 1 year ago

Wish it were mobile friendly.

Workaccount2over 1 year ago

Such an amazing tool

haltistover 1 year ago

Very cool.

BSTRhinoover 1 year ago

bbycroft is the GOAT!

wdiamondover 1 year ago

amazing router

reqoover 1 year ago

I feel like visualizations like this are what is missing from univeristy curricula. Now imagine a professor going trough each animation describing exactly what is happening, I am pretty sure students would get a much more in-depth understanding!

评论 #38508517 未加载

评论 #38509386 未加载

评论 #38509044 未加载