TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

The Structure of Neural Embeddings

74 点作者 sean_pedersen5 个月前

2 条评论

jmward015 个月前
Current embeddings are badly trained and are massively holding back networks. A core issue is something I call &#x27;token drag&#x27;. Low frequency tokens, when they finally come up, drag the model back towards an earlier state causing a lot of lost training. This leads to the first few layers of a model effectively being dedicated to just being a buffer to the bad embeddings feeding the model. Luckily fixing this is actually really easy. Creating a sacrificial two layer network to predict embeddings in training (and then just calculating the embeddings once for prod inference) gives a massive boost to training. To see this in action check out the unified embeddings in this project: <a href="https:&#x2F;&#x2F;github.com&#x2F;jmward01&#x2F;lmplay">https:&#x2F;&#x2F;github.com&#x2F;jmward01&#x2F;lmplay</a>
评论 #42531072 未加载
tomrod5 个月前
Oh wow, great set of reads. Thanks to @sean_pedersen for posting, looking forward to reviewing this in my closeout this year.