TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Tesla Dojo Whitepaper

13 点作者 edison112358超过 3 年前

2 条评论

childintime超过 3 年前
I&#x27;d like to mention a thought I had some time ago regarding the idea of using a byte FP format for ML training: instead of describing a byte in a sign&#x2F;mantissa&#x2F;exponent format, it might be advantageous to map the byte the 256 possible FP values, using a lookup table, to ideally chosen values. The curve implemented could be a sigmoid curve, for example. This would reduce quantization effects, likely not only resulting in a better convergence, but consistently so.<p>Maybe it would be necessary to adjust the curve to facilitate the reverse lookup, and reduce the time and silicon needed.
francoisp超过 3 年前
Interesting read. I wonder if this is only some bandwidth optimization to throw more hardware at the problem or an actual shift in perspective, ref no NaN&#x2F;Inf, instead clamps to maxval. Could this introduce artifacts&#x2F;will math libs need to code around this, or will this enable some new insight?