Very interesting! I posted a vector and quaternion lib here a few weeks ago, and got great feedback on the state of SIMD on these things. I since have went on a deep dive, and implemented wrapper types in a similar way to this library. Used macros to keep repetition down. Split into three main sections:<p><pre><code> - Floating point primitives. Like this lib. Basically, copied `core::simd`'s API. Will delete this part once core::simd is stable. `f32x8`, `f64x4` types etc, with standard operator overloads, and utility methods like `splat`, `to_array` etc.
- Vec and Quaternion analogs. Same idea, similar API. Vec3x8, Quaternionx8 etc.
- Code to convert slices of floating point values, or non-SIMD vectors and quaternions to SIMD ones, including (partial) handling of accessing valid lanes in the last chunk.
</code></pre>
I've incorporated these `x8` types into a WIP molecular dynamics application; relatively painless after setting up the infra. Would love to try `Vec3x16` etc, but 512-bit types aren't stable yet. But from Github activity on Rust, it sounds like this is right around the corner!<p>Of note, as others pointed out in the thread here I mentioned, the other vector etc libs are using the AoS approach, where a single f32x4 value etc is used to represent a Vec3 etc. While with this SoA approach, a `Vec3x8` is for performing operations on 8 Vec3s at once.<p>The article had interesting and surprising points on AVX-512 (Needed for f32x16, Vec3x16 etc). Not sure of the implications of exposing this in a public library is, i.e. might be a trap if the user isn't on one of the AMD Zen CPUs mentioned.<p>From a few examples, I seem to get 2-4x speedup from using the x8 intrinsics, over scalar (non-SIMD) operations.