TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Why the original transformer figure is wrong, and some other tidbits about LLMs

237 pointsby rasbtalmost 2 years ago

7 comments

YetAnotherNickalmost 2 years ago
This is the commit that changed it: <a href="https:&#x2F;&#x2F;github.com&#x2F;tensorflow&#x2F;tensor2tensor&#x2F;commit&#x2F;d5bdfcc85fa3e10a73902974f2c0944dc51f6a33">https:&#x2F;&#x2F;github.com&#x2F;tensorflow&#x2F;tensor2tensor&#x2F;commit&#x2F;d5bdfcc85...</a>
fn-motealmost 2 years ago
This note contains four papers for &quot;historical perspective&quot;... which would usually mean &quot;no longer directly relevant&quot;, although I&#x27;m not sure that&#x27;s really what the author means.<p>You might be looking for the author&#x27;s &quot;Understanding Large Language Models&quot; post [1] instead.<p>Misspelling &quot;Attention is All Your Need&quot; twice in one paragraph makes for a rough start to the linked post.<p>[1] <a href="https:&#x2F;&#x2F;magazine.sebastianraschka.com&#x2F;p&#x2F;understanding-large-language-models" rel="nofollow">https:&#x2F;&#x2F;magazine.sebastianraschka.com&#x2F;p&#x2F;understanding-large-...</a>
评论 #36059362 未加载
评论 #36059754 未加载
ameliusalmost 2 years ago
Are there any still-human-readable pictures where the entire transformer is shown in expanded form?
评论 #36059446 未加载
评论 #36058907 未加载
评论 #36059040 未加载
评论 #36060964 未加载
评论 #36058850 未加载
trivialmathalmost 2 years ago
I wonder if for example a function is an example of a transformer. So the phrase &quot;argument one is cat&quot; and argument two is dog and operation is join so the result is the word catdog is operated by the transformer as the function concat(cat,dog). Here the query is the function and the keys are the argument for the function and the value is a function from word to words.
评论 #36061193 未加载
ijidakalmost 2 years ago
Has anyone bought his book: &quot;Machine Learning Q and AI&quot;?<p>Is it a helpful read as a cliff notes for the latest in Generative AI?
andreykalmost 2 years ago
The actual title &quot;Why the Original Transformer Figure Is Wrong, and Some Other Interesting Historical Tidbits About LLMs&quot; is way more representative of what this post is about...<p>As to the figure being wrong, it&#x27;s kind of a nit-pick: &quot;While the original transformer figure above (from Attention Is All Your Need, <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1706.03762" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1706.03762</a>) is a helpful summary of the original encoder-decoder architecture, there is a slight discrepancy in this figure.<p>For instance, it places the layer normalization between the residual blocks, which doesn&#x27;t match the official (updated) code implementation accompanying the original transformer paper. The variant shown in the Attention Is All Your Need figure is known as Post-LN Transformer.&quot;
评论 #36059723 未加载
canjobearalmost 2 years ago
The original Transformer wasn’t user in an LLM.