TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Parsing 8-bit integers quickly

99 pointsby usdoguover 1 year ago

12 comments

kazinatorover 1 year ago
<p><pre><code> $ cat parse256.c #include &lt;stdio.h&gt; #include &lt;stdint.h&gt; #include &lt;string.h&gt; int parse_uint8_fastswar(const char *str, size_t len, uint8_t *num) { union { uint8_t as_str[4]; uint32_t as_int; } digits; memcpy(&amp;digits.as_int, str, sizeof(digits)); digits.as_int ^= 0x30303030lu; digits.as_int &lt;&lt;= (4 - (len &amp; 0x3)) * 8; uint32_t y = ((UINT64_C(0x640a0100) * digits.as_int)&gt;&gt;32)&amp;0xff; *num = (uint8_t)(y); return (digits.as_int &amp; 0xf0f0f0f0) == 0 &amp;&amp; y &lt; 256 &amp;&amp; len != 0 &amp;&amp; len &lt; 4; } int main(int argc, char **argv) { if (argv[1]) { uint8_t val = 0; if (parse_uint8_fastswar(argv[1], strlen(argv[1]), &amp;val)) { printf(&quot;value of %s is %d\n&quot;, argv[1], (int) val); } else { printf(&quot;%s is invalid, value stored %d\n&quot;, argv[1], (int) val); } } return 0; } $ .&#x2F;parse256 123 123 is invalid, value stored 225 $ uname -a Linux gcc1-power7.osuosl.org 3.10.0-862.14.4.el7.ppc64 #1 SMP Wed Sep 26 20:38:32 GMT 2018 ppc64 ppc64 ppc64 GNU&#x2F;Linux </code></pre> Oops!
评论 #38453572 未加载
评论 #38452398 未加载
评论 #38452736 未加载
评论 #38452202 未加载
评论 #38453119 未加载
vinkelhakeover 1 year ago
The SWAR algorithm accepts the 6 ASCII characters after &#x27;9&#x27;. It&#x27;ll parse &quot;:&gt;&quot; as 114.<p><pre><code> int res = parse_uint8_fastswar(&quot;:&gt;\0&quot;, 2, &amp;num); </code></pre> Returns true and num is 114.
评论 #38455386 未加载
评论 #38458896 未加载
jstanleyover 1 year ago
I tried this out but it parsed the string &quot;123\n&quot; as 32.<p>Also it parses &quot;400&quot; as 144, when the reference implementation considers it not-a-uint8, but I don&#x27;t mind so much about that.<p>EDIT: Ah, I think it assumes the string contains <i>only</i> a uint8, rather than trying to parse a uint8 from the start of a string. So you need to zero out the &quot;\n&quot; separately, and then it works.
评论 #38451703 未加载
firebazeover 1 year ago
Looking forward to &quot;parsing bit sequences in roman literals quickly&quot;<p><a href="https:&#x2F;&#x2F;hn.algolia.com&#x2F;?q=lemire+parsing" rel="nofollow noreferrer">https:&#x2F;&#x2F;hn.algolia.com&#x2F;?q=lemire+parsing</a>
zokierover 1 year ago
The use of union here is bit confusing (I think its unnecessary?), although I don&#x27;t imagine it making any difference in the generated code.
评论 #38452045 未加载
评论 #38452051 未加载
nasretdinovover 1 year ago
I imagine that you are not allowed to allocate a constant array that would contain a mapping between ASCII values of integers and the actual ints :)? The&#x27;re just 255 of them needed. Or woukd it be slower?
评论 #38451496 未加载
评论 #38451485 未加载
评论 #38452113 未加载
评论 #38451508 未加载
blacklionover 1 year ago
Guess the site by title :-) Good as usual.
IshKebabover 1 year ago
Presumably this also only works well if the data is 4-byte aligned.
nvartolomeiover 1 year ago
dlemire, you note that the read ”overflows”. Why can’t you copy just `len` bytes? Does it slow too much because of the branch&#x2F;more load&#x2F;store operations?
评论 #38456025 未加载
ameliusover 1 year ago
Fetching the data should be the bottleneck (by far), so why is the naive approach 2x slower than this smarter approach?<p>Sounds like the CPU should be designed in a smarter way, not the code.
评论 #38452939 未加载
doubloonover 1 year ago
i am so confused.<p>&quot;you are given a string and it&#x27;s length&quot;.. i dont understand.<p>is the string like &quot;1,1,22,2,189,3,12,2,120,3&quot; ???
评论 #38458841 未加载
aunwickover 1 year ago
Wow! 1st year digital logic question makes news on YC?