I make a no-CGO Go SQLite driver, by compiling the amalgamation to Wasm, then loading the result with wazero (a CGO-free Wasm runtime).<p>To compile SQLite, I use wasi-sdk, which uses wasi-libc, which is based on musl. It's been said that musl is slow(er than glibc), which is true, to a point.<p>musl uses SWAR on a size_t to implement various functions in string.h. This is fine, except size_t is just 32-bit on Wasm.<p>I found that implementing a few of those functions with Wasm SIMD128 can make them go around 4x faster.<p>Other functions don't even use SWAR; redoing <i>those</i> can make them 16x faster.<p>Smooth sort also has trouble pulling its own weight; a Shell sort seems both simpler and faster, while similarly avoiding recursion, allocations and the addressable stack.<p>I found that using SIMD intrinsics (rather than SWAR) makes it easier to avoid UB, but the code would definitely benefit from more eyeballs.<p>See this for some benchmarks on both x86-64 and Aarch64: <a href="https://github.com/ncruces/go-sqlite3/actions/runs/14516931864">https://github.com/ncruces/go-sqlite3/actions/runs/145169318...</a>