This seems to keep coming up, and I see confusion in the comments. There is a standard: IEEE 754-2008. There are <i>additional</i> things people add like approximate reciprocals and approximate sqrt. But if you don't use those, and you don't make an association error, you get consistent results.<p>The question here with association for summation is what you want to match. OP chose to match the <i>scalar</i> for-loop equivalent. You can just as easily make an 8-wide or 16-wide "virtual vector" and use that instead.<p>I suspect that an 8-wide virtual vector is the right <i>default</i> for people currently, since systems since Haswell support it, all recent AMD, and if you're using vectorization, you can afford to pay some overhead on Arm with a double-width virtual vector. You don't often gain enough from AVX512 to make the default 16-wide, but if you wanted to focus on Skylake+ (really Cascadelake+) or Genoa+ systems, it would be a fine choice.