Nice,<p>I've found that using an iterator for this often generates quite a bit of extra code and prevents vectorization in general which is why I switched to an API using inversion of control for the `forEach` case (I don't have iterators).<p>Working on one item at a time (which the iterator causes) resulted in quite a bit of overhead defeating partially the gains from having a more compact memory layout (SoA) with a more complex code path while preventing use of SIMD over multiple components at a time.<p>Is this an issue observed in this implementation and what is the general design space that this implementation targets? How it works now is partially how mach had it at least a year or more ago if I remember correctly while they still had an ECS.