TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Tesla Dojo Whitepaper

13 pointsby edison112358over 3 years ago

2 comments

childintimeover 3 years ago
I&#x27;d like to mention a thought I had some time ago regarding the idea of using a byte FP format for ML training: instead of describing a byte in a sign&#x2F;mantissa&#x2F;exponent format, it might be advantageous to map the byte the 256 possible FP values, using a lookup table, to ideally chosen values. The curve implemented could be a sigmoid curve, for example. This would reduce quantization effects, likely not only resulting in a better convergence, but consistently so.<p>Maybe it would be necessary to adjust the curve to facilitate the reverse lookup, and reduce the time and silicon needed.
francoispover 3 years ago
Interesting read. I wonder if this is only some bandwidth optimization to throw more hardware at the problem or an actual shift in perspective, ref no NaN&#x2F;Inf, instead clamps to maxval. Could this introduce artifacts&#x2F;will math libs need to code around this, or will this enable some new insight?