TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

karpathy/build-nanogpt: Video + code lecture on building nanoGPT from scratch

9 pointsby codewiz11 months ago

2 comments

codewiz11 months ago
I love the way Andrej Karpathy explains things. The code for the feed-forward block of a transformer looks like this:<p><pre><code> def forward(self, x): x = x + self.attn(self.ln_1(x)) x = x + self.mlp(self.ln_2(x)) return x </code></pre> This how Andrej describes it (starting at 19:00 into the video):<p>&quot;This is the pre-normalization version, where you see that x first goes through the layer normalization [ln_1] and then the attention (attn), and then goes back out to go to the layer normalization number two and the multilayer perceptron [MLP], sometimes also referred to as feed-forward network, FFN, and then that goes into the residual stream again.&quot;<p>&quot;And the one more thing that&#x27;s kind of interesting to note is: recall that attention is a communication operation, it is where all the tokens - and there&#x27;s 1024 tokens lined up in a sequence - this is where the tokens communicate, where they exchange information... so, attention is an aggregation function, it&#x27;s a pooling function, it&#x27;s a weighted sum function, it is a <i>reduce</i> operation, whereas this MLP [multilayer perceptron] happens every single token individually - there&#x27;s no information being collected or exchanged between the tokens. So the attention is the reduce, and the MLP is the <i>map</i>.&quot;<p>&quot;And the transformer ends up just being repeated application of map-reduce, if you wanna think about it that way.&quot;
gnabgib11 months ago
H1: <i>build nanoGPT</i><p>Title: <i>Video+code lecture on building nanoGPT from scratch</i><p><i>Please don&#x27;t do things to make titles stand out (..) Otherwise please use the original title</i> <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;newsguidelines.html">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;newsguidelines.html</a>
评论 #40630145 未加载