TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

IEEE 754 16-bit Floating Point Format

7 点作者 specular超过 9 年前

1 comment

exDM69超过 9 年前
When available, it&#x27;s a good idea to use the F16C&#x2F;CVT16 instruction set [0] for converting between single (32 bit) and half (16 bit) precision. I think ARM NEON has a comparable instruction set. They operate on SIMD registers, so if you&#x27;re using SIMD, there might be substantial benefits in using them.<p>The actual arithmetic is still done in 32 bit precision and the conversions are only done when loading from or storing to memory. Some GPUs actually have 16 bit arithmetic internally, but most use 32 bit ALUs and just convert on load&#x2F;store.<p>Alternatively, if you don&#x27;t care about NaNs and denorms, and whatnot (e.g. the use cases for 3d model data mentioned in the README don&#x27;t really involve NaNs or denorms), some simple bitshifting can do the conversion.<p>Here&#x27;s a snippet of Python code that I&#x27;ve used:<p><pre><code> def float2half(float_val): f = unpack(&#x27;I&#x27;, pack(&#x27;f&#x27;, float_val))[0] if f == 0: return 0 if f == 0x80000000: return 0x8000 return ((f&gt;&gt;16)&amp;0x8000) | ((((f&amp;0x7f800000)-0x38000000)&gt;&gt;13)&amp;0x7c00) | ((f&gt;&gt;13)&amp;0x03ff) def half2float(h): if h == 0: return 0 if h == 0x8000: return 0x80000000 f = ((h&amp;0x8000)&lt;&lt;16) | (((h&amp;0x7c00)+0x1C000)&lt;&lt;13) | ((h&amp;0x03FF)&lt;&lt;13) return unpack(&#x27;f&#x27;, pack(&#x27;I&#x27;, f))[0] </code></pre> [0] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;F16C" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;F16C</a>