TechEcho

7 comments

This is the commit that changed it: <a href="https://github.com/tensorflow/tensor2tensor/commit/d5bdfcc85fa3e10a73902974f2c0944dc51f6a33">https://github.com/tensorflow/tensor2tensor/commit/d5bdfcc85...</a>

fn-motealmost 2 years ago

This note contains four papers for "historical perspective"... which would usually mean "no longer directly relevant", although I'm not sure that's really what the author means.You might be looking for the author's "Understanding Large Language Models" post [1] instead.Misspelling "Attention is All Your Need" twice in one paragraph makes for a rough start to the linked post.[1] <a href="https://magazine.sebastianraschka.com/p/understanding-large-language-models" rel="nofollow">https://magazine.sebastianraschka.com/p/understanding-large-...</a>

评论 #36059362 未加载

评论 #36059754 未加载

ameliusalmost 2 years ago

Are there any still-human-readable pictures where the entire transformer is shown in expanded form?

评论 #36059446 未加载

评论 #36058907 未加载

评论 #36059040 未加载

评论 #36060964 未加载

评论 #36058850 未加载

trivialmathalmost 2 years ago

I wonder if for example a function is an example of a transformer. So the phrase "argument one is cat" and argument two is dog and operation is join so the result is the word catdog is operated by the transformer as the function concat(cat,dog). Here the query is the function and the keys are the argument for the function and the value is a function from word to words.

评论 #36061193 未加载

ijidakalmost 2 years ago

Has anyone bought his book: "Machine Learning Q and AI"?Is it a helpful read as a cliff notes for the latest in Generative AI?

andreykalmost 2 years ago

The actual title "Why the Original Transformer Figure Is Wrong, and Some Other Interesting Historical Tidbits About LLMs" is way more representative of what this post is about...As to the figure being wrong, it's kind of a nit-pick: "While the original transformer figure above (from Attention Is All Your Need, <a href="https://arxiv.org/abs/1706.03762" rel="nofollow">https://arxiv.org/abs/1706.03762</a>) is a helpful summary of the original encoder-decoder architecture, there is a slight discrepancy in this figure.For instance, it places the layer normalization between the residual blocks, which doesn't match the official (updated) code implementation accompanying the original transformer paper. The variant shown in the Attention Is All Your Need figure is known as Post-LN Transformer."

评论 #36059723 未加载

canjobearalmost 2 years ago

The original Transformer wasn’t user in an LLM.

7 comments

YetAnotherNickalmost 2 years ago

fn-motealmost 2 years ago

评论 #36059362 未加载

评论 #36059754 未加载

ameliusalmost 2 years ago

Are there any still-human-readable pictures where the entire transformer is shown in expanded form?

评论 #36059446 未加载

评论 #36058907 未加载

评论 #36059040 未加载

评论 #36060964 未加载

评论 #36058850 未加载

trivialmathalmost 2 years ago

评论 #36061193 未加载

ijidakalmost 2 years ago

Has anyone bought his book: "Machine Learning Q and AI"?Is it a helpful read as a cliff notes for the latest in Generative AI?

andreykalmost 2 years ago

评论 #36059723 未加载

canjobearalmost 2 years ago

The original Transformer wasn’t user in an LLM.

Why the original transformer figure is wrong, and some other tidbits about LLMs

7 comments

Why the original transformer figure is wrong, and some other tidbits about LLMs

7 comments