TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Leveraging SIMD: Splitting CSV Files at 3Gb/S

89 点作者 __exit__超过 3 年前

7 条评论

zwegner超过 3 年前
Pretty similar article from very recently: <a href="https:&#x2F;&#x2F;nullprogram.com&#x2F;blog&#x2F;2021&#x2F;12&#x2F;04&#x2F;" rel="nofollow">https:&#x2F;&#x2F;nullprogram.com&#x2F;blog&#x2F;2021&#x2F;12&#x2F;04&#x2F;</a><p>Discussion: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=29439403" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=29439403</a><p>The article mentions in an addendum (and BeeOnRope also pointed it out in the HN thread) a nice CLMUL trick for dealing with quotes originally discovered by Geoff Langdale. That should work here for a nice speedup.<p>But without the CLMUL trick, I&#x27;d guess that the unaligned loads that generally occur after a vector containing both quotes and newlines in this version (the &quot;else&quot; case on lines 34-40) would hamper the performance somewhat, since it would eat up twice as much L1 cache bandwidth. I&#x27;d suggest dealing with the masks using bitwise operations in a loop, and letting i stay divisible by 16. Or just use CLMUL :)
评论 #29578016 未加载
评论 #29602606 未加载
评论 #29579395 未加载
jagrsw超过 3 年前
Not sure how the author of this entry on HN managed to change original title from<p>gigabytes per second<p>to<p>gigabits per siemens<p>:)
评论 #29578893 未加载
评论 #29577530 未加载
评论 #29578005 未加载
评论 #29577457 未加载
mattewong超过 3 年前
Stay tuned for a SIMD powered CSV parser library and standalone utility about to drop this weekend. Alpha, but test showing it to be faster than anything else we could get our hands on
评论 #29620538 未加载
liuliu超过 3 年前
Splitting CSV file into chunks and process them independently won&#x27;t necessarily be wrong (although there are implementations out there that I won&#x27;t name would, because they do guess). The trick however requires to scan twice: <a href="https:&#x2F;&#x2F;liuliu.me&#x2F;eyes&#x2F;loading-csv-file-at-the-speed-limit-of-the-nvme-storage&#x2F;" rel="nofollow">https:&#x2F;&#x2F;liuliu.me&#x2F;eyes&#x2F;loading-csv-file-at-the-speed-limit-o...</a><p>Nice article otherwise!
michaelg7x超过 3 年前
Presumably solving the same kind of delimiter-finding issues as Hyperscan? <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=19270199" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=19270199</a>
评论 #29584257 未加载
Tuna-Fish超过 3 年前
Why is the unit expression in topic messed up?
rwmj超过 3 年前
Nice, but I&#x27;m afraid real world CSVs are a lot more complicated than described so don&#x27;t use this code in production.
评论 #29577046 未加载