TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Why Are Sinusoidal Functions Used for Position Encoding?

5 点作者 mfn大约 2 年前

1 comment

mfn大约 2 年前
Sinusoidal positional embeddings have always seemed a bit mysterious - even more so since papers don&#x27;t tend to delve much into the intuition behind them. For example, from Vaswani et al., 2017:<p>&gt; That is, each dimension of the positional encoding corresponds to a sinusoid. The wavelengths form a geometric progression from 2π to 10000 · 2π. We chose this function because we hypothesized it would allow the model to easily learn to attend by relative positions, since for any fixed offset k, PE(pos+k) can be represented as a linear function of PE(pos).<p>Inspired largely by the RoFormer paper (<a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2104.09864" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2104.09864</a>), I thought I&#x27;d write a post that dives a bit into how intuitive considerations around linearity and relative positions can lead to the idea of using sinusoidal functions to encode positions.<p>Would appreciate any thoughts or feedback!