TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Why the original transformer figure is wrong, and some other tidbits about LLMs

237 点作者 rasbt将近 2 年前

7 条评论

YetAnotherNick将近 2 年前
This is the commit that changed it: <a href="https:&#x2F;&#x2F;github.com&#x2F;tensorflow&#x2F;tensor2tensor&#x2F;commit&#x2F;d5bdfcc85fa3e10a73902974f2c0944dc51f6a33">https:&#x2F;&#x2F;github.com&#x2F;tensorflow&#x2F;tensor2tensor&#x2F;commit&#x2F;d5bdfcc85...</a>
fn-mote将近 2 年前
This note contains four papers for &quot;historical perspective&quot;... which would usually mean &quot;no longer directly relevant&quot;, although I&#x27;m not sure that&#x27;s really what the author means.<p>You might be looking for the author&#x27;s &quot;Understanding Large Language Models&quot; post [1] instead.<p>Misspelling &quot;Attention is All Your Need&quot; twice in one paragraph makes for a rough start to the linked post.<p>[1] <a href="https:&#x2F;&#x2F;magazine.sebastianraschka.com&#x2F;p&#x2F;understanding-large-language-models" rel="nofollow">https:&#x2F;&#x2F;magazine.sebastianraschka.com&#x2F;p&#x2F;understanding-large-...</a>
评论 #36059362 未加载
评论 #36059754 未加载
amelius将近 2 年前
Are there any still-human-readable pictures where the entire transformer is shown in expanded form?
评论 #36059446 未加载
评论 #36058907 未加载
评论 #36059040 未加载
评论 #36060964 未加载
评论 #36058850 未加载
trivialmath将近 2 年前
I wonder if for example a function is an example of a transformer. So the phrase &quot;argument one is cat&quot; and argument two is dog and operation is join so the result is the word catdog is operated by the transformer as the function concat(cat,dog). Here the query is the function and the keys are the argument for the function and the value is a function from word to words.
评论 #36061193 未加载
ijidak将近 2 年前
Has anyone bought his book: &quot;Machine Learning Q and AI&quot;?<p>Is it a helpful read as a cliff notes for the latest in Generative AI?
andreyk将近 2 年前
The actual title &quot;Why the Original Transformer Figure Is Wrong, and Some Other Interesting Historical Tidbits About LLMs&quot; is way more representative of what this post is about...<p>As to the figure being wrong, it&#x27;s kind of a nit-pick: &quot;While the original transformer figure above (from Attention Is All Your Need, <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1706.03762" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1706.03762</a>) is a helpful summary of the original encoder-decoder architecture, there is a slight discrepancy in this figure.<p>For instance, it places the layer normalization between the residual blocks, which doesn&#x27;t match the official (updated) code implementation accompanying the original transformer paper. The variant shown in the Attention Is All Your Need figure is known as Post-LN Transformer.&quot;
评论 #36059723 未加载
canjobear将近 2 年前
The original Transformer wasn’t user in an LLM.