The other problem with simd is that in modern cpu-centric languages it often requires a rewrite for every vector width.<p>And for 80% of the cases by the point there is enough vectorizable data for a programmer to look into simd, a gpu can provide 1000%+ of perf AND a certain level of portability.<p>So right now simd is a niche tool for super low-level things: certain decompression algos, bits of math here and there, solvers, etc.<p>And it also takes a lot of space on your cpu die. Like, A LOT.