The actual title "Why the Original Transformer Figure Is Wrong, and Some Other Interesting Historical Tidbits About LLMs" is way more representative of what this post is about...<p>As to the figure being wrong, it's kind of a nit-pick:
"While the original transformer figure above (from Attention Is All Your Need, <a href="https://arxiv.org/abs/1706.03762" rel="nofollow">https://arxiv.org/abs/1706.03762</a>) is a helpful summary of the original encoder-decoder architecture, there is a slight discrepancy in this figure.<p>For instance, it places the layer normalization between the residual blocks, which doesn't match the official (updated) code implementation accompanying the original transformer paper. The variant shown in the Attention Is All Your Need figure is known as Post-LN Transformer."