TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Parsing short hexadecimal strings efficiently

26 点作者 fcambus大约 6 年前

5 条评论

nwallin大约 6 年前
You can use the &quot;fancy math&quot; version on all 8 bytes in the string simultaneously:<p><pre><code> #include &lt;cstdint&gt; uint32_t convert_hex(const char* s) { uint64_t a = *reinterpret_cast&lt;const uint64_t*&gt;(s); a = (a &amp; 0x0F0F0F0F0F0F0F0Fu) + 9 * ((a &amp; 0xc0c0c0c0c0c0c0c0) &gt;&gt; 6); a = (a &amp; 0x000F000F000F000Fu) | ((a &amp; 0x0F000F000F000F00) &gt;&gt; 4); a = (a &amp; 0x000000FF000000FFu) | ((a &amp; 0x00FF000000FF0000) &gt;&gt; 8); uint32_t b = (a &amp; 0x000000000000FFFFu) | ((a &amp; 0x0000FFFF00000000) &gt;&gt; 16); b = __builtin_bswap32(b); b = (b &amp; 0x0f0f0f0f) &lt;&lt; 4 | (b &amp; 0xf0f0f0f0) &gt;&gt; 4; return b; } </code></pre> Compiles to 30 instructions with clang, so 3.75 instructions per byte. (clang is one instruction cleverer than gcc) No branching, the only &quot;complicated&quot; instructions are bswap (__builtin_bswap32) lea, (the &quot;multiplication&quot; by nine) and one addition. Other than that it&#x27;s all bit manipulations. (moves, shifts, ands, ors) However, I doubt it will pipeline well, the data dependencies are quite linear.<p>It has <i>very</i> little tolerance for inputs that are anything other than 8 byte hex strings. Do error checking elsewhere.<p>I doubt this would be faster in any meaningful sense in a real world use case.
dmoldavanov大约 6 年前
Using of array lookup is a bad wayof optimization: digittoval[src[N]] can take up to 200 cycles if not in cache<p>Only synthetic tests that small enough (most of them do nothing than tested code) show good results.
评论 #19686254 未加载
评论 #19686175 未加载
timerol大约 6 年前
The current winner is a SIMD-ish solution using 32 bit math as 4 8-bit lanes. This has been added to the repo, and is discussed in the comments.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;lemire&#x2F;Code-used-on-Daniel-Lemire-s-blog&#x2F;tree&#x2F;master&#x2F;2019&#x2F;04&#x2F;17" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;lemire&#x2F;Code-used-on-Daniel-Lemire-s-blog&#x2F;...</a>
daeken大约 6 年前
I don&#x27;t have the spare cycles to test this out for myself right now, but I&#x27;m curious if a 64kb table would be substantially faster. You&#x27;d cut it down from 8 indirections, 3 shifts, and 3 ORs to 4 indirections, 1 shift, and 1 OR.<p><pre><code> uint32_t hex_to_u32_lookup(const uint8_t *src) { uint16_t *wsrc = (uint16_t *) src; uint32_t v1 = digittoval[wsrc[0]]; uint32_t v2 = digittoval[wsrc[1]]; return v2 &lt;&lt; 8 | v1; } </code></pre> (Also, small note: The function is named `hex_to_u32_lookup` but it only gives a 16-bit result. Might want to clarify in the post.)
评论 #19686076 未加载
csense大约 6 年前
With modern hardware, for this to be a meaningful percentage of your workload, you&#x27;d need to be working with super high data volumes (100 megabytes per second at the very least, maybe an order of magnitude more depending on the exact hardware).<p>A reasonable system designer would never create an interface that accepts only hexadecimal input for bulk data transfers at those rates. That would waste half the I&#x2F;O, which makes no sense!<p>In this case, the problem with your system isn&#x27;t the cycle count of your hex conversion function. It&#x27;s a poorly designed interface that&#x27;s forced to use hex for high data volumes. You get maybe a fraction of a percent optimizing the hex conversion function, but there&#x27;s a low-hanging 100% improvement if you just use all 8 bits of your bytes.<p>Or in Donald Knuth&#x27;s words, &quot;Premature optimization is the root of all evil.&quot;
评论 #19687143 未加载
评论 #19687076 未加载