I experimented with the proposed parallel data type extensions to the C++ standard library. I got impressive performance gains for calculating APFS fletcher checksums without resorting to compiler intrinsics or inline assembly.<p>Gains were even more impressive when adding some simple loop unrolling: <a href="https://jtsylve.blog/post/2022/12/24/Blazingly-Fast-er-SIMD-Checksums" rel="nofollow">https://jtsylve.blog/post/2022/12/24/Blazingly-Fast-er-SIMD-...</a>