Pretty similar article from very recently: <a href="https://nullprogram.com/blog/2021/12/04/" rel="nofollow">https://nullprogram.com/blog/2021/12/04/</a><p>Discussion: <a href="https://news.ycombinator.com/item?id=29439403" rel="nofollow">https://news.ycombinator.com/item?id=29439403</a><p>The article mentions in an addendum (and BeeOnRope also pointed it out in the HN thread) a nice CLMUL trick for dealing with quotes originally discovered by Geoff Langdale. That should work here for a nice speedup.<p>But without the CLMUL trick, I'd guess that the unaligned loads that generally occur after a vector containing both quotes and newlines in this version (the "else" case on lines 34-40) would hamper the performance somewhat, since it would eat up twice as much L1 cache bandwidth. I'd suggest dealing with the masks using bitwise operations in a loop, and letting i stay divisible by 16. Or just use CLMUL :)