TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

The Illustrated Transformer (2018)

162 点作者 debdut11 个月前

6 条评论

xianshou11 个月前
Illustrated Transformer is amazing as a way of understanding the original transformer architecture step-by-step, but if you want to truly visualize how information flows through a decoder-only architecture - from nanoGPT all the way up to a fully represented GPT-3 - <i>nothing</i> beats this:<p><a href="https:&#x2F;&#x2F;bbycroft.net&#x2F;llm" rel="nofollow">https:&#x2F;&#x2F;bbycroft.net&#x2F;llm</a>
评论 #40863137 未加载
ryan-duve11 个月前
I gave a talk on using Google BERT for financial services problems at a machine learning conference in early 2019. During my preparation, this was the only resource on transformers I could find that was even remotely understandable to me.<p>I had a lot of trouble understand what was going on from just the original publication[0].<p>[0] <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1706.03762" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1706.03762</a>
评论 #40863350 未加载
评论 #40861649 未加载
评论 #40861773 未加载
crystal_revenge11 个月前
While I absolutely love this illustration (and frankly everything Jay Alammar does), it is worth recognizing there is a distinction between visualizing <i>how</i> a transformer (or any model really works) and <i>what</i> the transformer is doing.<p>My favorite article on the latter is Cosma Shalizi&#x27;s excellent post showing that all &quot;attention&quot; is really doing is kernel smoothing [0]. Personally having this &#x27;click&#x27; was a bigger insight for me than walking through this post and implementing &quot;attention is all you need&quot;.<p>In a very real sense transformers are just performing compression and providing a soft lookup functionality on top of an unimaginably large dataset (basically the majority of human writing). This understanding of LLMs helps to better understand their limitations as well as their, imho untapped, usefulness.<p>0. <a href="http:&#x2F;&#x2F;bactra.org&#x2F;notebooks&#x2F;nn-attention-and-transformers.html" rel="nofollow">http:&#x2F;&#x2F;bactra.org&#x2F;notebooks&#x2F;nn-attention-and-transformers.ht...</a>
评论 #40876802 未加载
tomashm10 个月前
This is good, but what bade me finally understand the transformer architecture [0] and attention [1], are 3Blue1Brown&#x27;s videos.<p>0. <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=wjZofJX0v4M" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=wjZofJX0v4M</a><p>1. <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=eMlx5fFNoYc" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=eMlx5fFNoYc</a>
photon_lines11 个月前
Great post and write-up - I also made an in-depth explorations and did my best to use visuals - for anyone interested you can find it here: <a href="https:&#x2F;&#x2F;photonlines.substack.com&#x2F;p&#x2F;intuitive-and-visual-guide-to-transformers" rel="nofollow">https:&#x2F;&#x2F;photonlines.substack.com&#x2F;p&#x2F;intuitive-and-visual-guid...</a>
jerpint11 个月前
I go back regligiously to this post whenever I need a quick visual refresh on how transformers work, I can’t overstate how fantastic it is