A little while back, mothcamp published the "NLP Demystified" series: <a href="https://news.ycombinator.com/item?id=33815146" rel="nofollow">https://news.ycombinator.com/item?id=33815146</a><p>The video on the transformers, at around 19:00, has a visual explanation of how sinusoids were used in the original architecture to add positional information. Before seeing that the positional component was always just black magic to me. Thanks for that and the rest of the series, mothcamp!