科技回声

7 条评论

This is the commit that changed it: <a href="https://github.com/tensorflow/tensor2tensor/commit/d5bdfcc85fa3e10a73902974f2c0944dc51f6a33">https://github.com/tensorflow/tensor2tensor/commit/d5bdfcc85...</a>

fn-mote将近 2 年前

This note contains four papers for "historical perspective"... which would usually mean "no longer directly relevant", although I'm not sure that's really what the author means.You might be looking for the author's "Understanding Large Language Models" post [1] instead.Misspelling "Attention is All Your Need" twice in one paragraph makes for a rough start to the linked post.[1] <a href="https://magazine.sebastianraschka.com/p/understanding-large-language-models" rel="nofollow">https://magazine.sebastianraschka.com/p/understanding-large-...</a>

评论 #36059362 未加载

评论 #36059754 未加载

amelius将近 2 年前

Are there any still-human-readable pictures where the entire transformer is shown in expanded form?

评论 #36059446 未加载

评论 #36058907 未加载

评论 #36059040 未加载

评论 #36060964 未加载

评论 #36058850 未加载

trivialmath将近 2 年前

I wonder if for example a function is an example of a transformer. So the phrase "argument one is cat" and argument two is dog and operation is join so the result is the word catdog is operated by the transformer as the function concat(cat,dog). Here the query is the function and the keys are the argument for the function and the value is a function from word to words.

评论 #36061193 未加载

ijidak将近 2 年前

Has anyone bought his book: "Machine Learning Q and AI"?Is it a helpful read as a cliff notes for the latest in Generative AI?

andreyk将近 2 年前

The actual title "Why the Original Transformer Figure Is Wrong, and Some Other Interesting Historical Tidbits About LLMs" is way more representative of what this post is about...As to the figure being wrong, it's kind of a nit-pick: "While the original transformer figure above (from Attention Is All Your Need, <a href="https://arxiv.org/abs/1706.03762" rel="nofollow">https://arxiv.org/abs/1706.03762</a>) is a helpful summary of the original encoder-decoder architecture, there is a slight discrepancy in this figure.For instance, it places the layer normalization between the residual blocks, which doesn't match the official (updated) code implementation accompanying the original transformer paper. The variant shown in the Attention Is All Your Need figure is known as Post-LN Transformer."

评论 #36059723 未加载

canjobear将近 2 年前

The original Transformer wasn’t user in an LLM.

7 条评论

YetAnotherNick将近 2 年前

fn-mote将近 2 年前

评论 #36059362 未加载

评论 #36059754 未加载

amelius将近 2 年前

Are there any still-human-readable pictures where the entire transformer is shown in expanded form?

评论 #36059446 未加载

评论 #36058907 未加载

评论 #36059040 未加载

评论 #36060964 未加载

评论 #36058850 未加载

trivialmath将近 2 年前

评论 #36061193 未加载

ijidak将近 2 年前

Has anyone bought his book: "Machine Learning Q and AI"?Is it a helpful read as a cliff notes for the latest in Generative AI?

andreyk将近 2 年前

评论 #36059723 未加载

canjobear将近 2 年前

The original Transformer wasn’t user in an LLM.

Why the original transformer figure is wrong, and some other tidbits about LLMs

7 条评论

Why the original transformer figure is wrong, and some other tidbits about LLMs

7 条评论