TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Decoding UTF8 with parallel extract

111 pointsby g0xA52A2Aabout 1 year ago

4 comments

nwellnhofabout 1 year ago
In my experiments, anything using lookup tables was slower than a naive, branching decoder on real-world data. Reading from a lookup table in L1 cache has ~4 cycles latency which is prohibitive for the simple case of mostly ASCII bytes. You can easily achieve more than 1.5 GB/s with a naive decoder while all the "smarter" approaches are capped to ~800 MB/s.
评论 #40268079 未加载
评论 #40268382 未加载
评论 #40266601 未加载
pbsdabout 1 year ago
The overlong lookup can also be written without a memory lookup as<p><pre><code> 0x10000U &gt;&gt; ((0x1531U &gt;&gt; (i*5)) &amp; 31); </code></pre> On most current x86 chips this has a latency of 3 cycles -- LEA+SHR+SHR -- which is better than an L1 cache hit almost everywhere.
评论 #40274924 未加载
评论 #40277760 未加载
评论 #40268440 未加载
clauseckerabout 1 year ago
This looks very similar to the approach we recently used to transcode UTF-8 into UTF-16 using AVX-512: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2212.05098" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2212.05098</a><p>It&#x27;s part of simdutf.
masfuerteabout 1 year ago
The code is careful not to read past the end of the buffer, but it doesn&#x27;t explicitly check that there are enough bytes available for the current multibyte character. However, this &quot;end of buffer in middle of character&quot; error is caught later by the check for valid continuation bytes. I thought that was quite neat.