> It is tempting to build a math library around SIMD hoping to get some performance gains. However, it often has no proven benefit ... For example, game play programmers often do a lot of piecemeal vector math. They are not chopping 8 carrots at once<p>Her point is well taken however we beat the odds on the PlayStation/3: I don't trust my memory to give a frame-time percentage but switching our "one carrot at a time" libraries from scalar to AltiVec made a measurable impact for not a lot of work.<p>We originally ported it all to SSE2 so that we'd hit GPFs for misaligned when testing on PC but whenever I compare with the Scalar version it's marginally better too so it's held up over time.<p>Conversely, we've recently found on the Nintendo Switch that NEON isn't a clear win; I suspect that the in addition to shuffling overhead you don't quite get "4 for the price of 1" like you seem to elsewhere, ie: if you're doing a 3D vectors or matrices padded into 4-float registers unused calculations in the fourth component have a cost.<p>So she's right -- chop 8 carrots at once if you can -- but sometimes (but not always) you can chop just 1 carrot faster with SIMD.