I honestly didn't realize how performant the decades-old 2013 Haswell architecture is on vector workloads.<p>250GFLOP/core is no joke - He also cross-compared to an M1 Pro, that when not using the secret matrix coprocessor achieves effectively the same vector throughput, a decade later...