TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Validating UTF-8 bytes using only 0.45 cycles per byte (AVX edition)

148 点作者 akarambir超过 6 年前

5 条评论

the_clarence超过 6 年前
I see a lot of applications trying to take advantage of SIMD, but what when you try to run them on systems that don't support these instructions? My guess is that you need to write multiple files taking advantage of different sets of instructions and then dynamically figure out which to use at runtime with cpuid, but isn't that cumbersome and a way to inflate a codebase dramatically?
评论 #18264022 未加载
评论 #18264765 未加载
评论 #18264592 未加载
评论 #18264181 未加载
评论 #18264282 未加载
评论 #18265520 未加载
bradleyjg超过 6 年前
Under the new string model in java &gt; 8 a fairly frequent workflow is:<p>1) get external string<p>2) figure out if it is UTF-8, UTF-16, or some other recognizable encoding<p>3) validate the byte stream<p>4) figure out if the code points in the incoming string can be represented in Latin-1<p>5) instantiate a java string using either the Latin-1 encoder or the UTF-16 encoder<p>I know some or all of these steps are done using hotspot intrinsics, and then the JIT&#x2F;VM does inlining, folding and so on, but I wonder how fast a custom assembly function to do all these steps at once could be.
评论 #18263887 未加载
评论 #18264088 未加载
jwilk超过 6 年前
Previous blog post on HN:<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=17081571" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=17081571</a>
kissiel超过 6 年前
I wonder about the Joules per byte. AFAIK AVX units are quite expensive energy-wise.
评论 #18263217 未加载
评论 #18263568 未加载
akarambir超过 6 年前
What does linux utilities like sed, awk use for text manipulation because they were very slow when I was changing a few table names in a sql file.
评论 #18262951 未加载
评论 #18263159 未加载
评论 #18263111 未加载
评论 #18268585 未加载
评论 #18263119 未加载