TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Rotary Embeddings: A Relative Revolution

6 pointsby asparaguiabout 4 years ago

1 comment

PaulHouleabout 4 years ago
No wonder. There is wisdom in circular intervals.<p><a href="https:&#x2F;&#x2F;www.amazon.com&#x2F;Geometry-Biological-Interdisciplinary-Applied-Mathematics&#x2F;dp&#x2F;0387989927" rel="nofollow">https:&#x2F;&#x2F;www.amazon.com&#x2F;Geometry-Biological-Interdisciplinary...</a><p>Neural networks learn bad habits from the (-∞,∞) range, particularly like polynomials they like to make big coefficients that make terms that cancel out to get precise answers.<p>I see all the states on an LSTM creep up in magnitude as it scans a document and that is really just wrong.<p>It makes me think that FP16 is a joke, you get some regularization by not letting the numbers get too big.