TechEcho

7 comments

zwegnerover 3 years ago

Pretty similar article from very recently: <a href="https://nullprogram.com/blog/2021/12/04/" rel="nofollow">https://nullprogram.com/blog/2021/12/04/</a>Discussion: <a href="https://news.ycombinator.com/item?id=29439403" rel="nofollow">https://news.ycombinator.com/item?id=29439403</a>The article mentions in an addendum (and BeeOnRope also pointed it out in the HN thread) a nice CLMUL trick for dealing with quotes originally discovered by Geoff Langdale. That should work here for a nice speedup.But without the CLMUL trick, I'd guess that the unaligned loads that generally occur after a vector containing both quotes and newlines in this version (the "else" case on lines 34-40) would hamper the performance somewhat, since it would eat up twice as much L1 cache bandwidth. I'd suggest dealing with the masks using bitwise operations in a loop, and letting i stay divisible by 16. Or just use CLMUL :)

评论 #29578016 未加载

评论 #29602606 未加载

评论 #29579395 未加载

jagrswover 3 years ago

Not sure how the author of this entry on HN managed to change original title fromgigabytes per secondtogigabits per siemens:)

评论 #29578893 未加载

评论 #29577530 未加载

评论 #29578005 未加载

评论 #29577457 未加载

mattewongover 3 years ago

Stay tuned for a SIMD powered CSV parser library and standalone utility about to drop this weekend. Alpha, but test showing it to be faster than anything else we could get our hands on

评论 #29620538 未加载

liuliuover 3 years ago

Splitting CSV file into chunks and process them independently won't necessarily be wrong (although there are implementations out there that I won't name would, because they do guess). The trick however requires to scan twice: <a href="https://liuliu.me/eyes/loading-csv-file-at-the-speed-limit-of-the-nvme-storage/" rel="nofollow">https://liuliu.me/eyes/loading-csv-file-at-the-speed-limit-o...</a>Nice article otherwise!

michaelg7xover 3 years ago

Presumably solving the same kind of delimiter-finding issues as Hyperscan? <a href="https://news.ycombinator.com/item?id=19270199" rel="nofollow">https://news.ycombinator.com/item?id=19270199</a>

评论 #29584257 未加载

Tuna-Fishover 3 years ago

Why is the unit expression in topic messed up?

rwmjover 3 years ago

Nice, but I'm afraid real world CSVs are a lot more complicated than described so don't use this code in production.

评论 #29577046 未加载

7 comments

zwegnerover 3 years ago

评论 #29578016 未加载

评论 #29602606 未加载

评论 #29579395 未加载

jagrswover 3 years ago

Not sure how the author of this entry on HN managed to change original title fromgigabytes per secondtogigabits per siemens:)

评论 #29578893 未加载

评论 #29577530 未加载

评论 #29578005 未加载

评论 #29577457 未加载

mattewongover 3 years ago

Stay tuned for a SIMD powered CSV parser library and standalone utility about to drop this weekend. Alpha, but test showing it to be faster than anything else we could get our hands on

评论 #29620538 未加载

liuliuover 3 years ago

michaelg7xover 3 years ago

Presumably solving the same kind of delimiter-finding issues as Hyperscan? <a href="https://news.ycombinator.com/item?id=19270199" rel="nofollow">https://news.ycombinator.com/item?id=19270199</a>

评论 #29584257 未加载

Tuna-Fishover 3 years ago

Why is the unit expression in topic messed up?

rwmjover 3 years ago

Nice, but I'm afraid real world CSVs are a lot more complicated than described so don't use this code in production.

评论 #29577046 未加载

Leveraging SIMD: Splitting CSV Files at 3Gb/S

7 comments

Leveraging SIMD: Splitting CSV Files at 3Gb/S

7 comments