Great article. I especially liked the comparison of float32 to float64. I had no idea it was as large as a 10x improvement in a simple case like the one demonstrated.<p>* Elapsed (secs)<p>* CPU instructions (G)<p>* Cache misses (M)<p>* float64 : 4.7 1.6 30.4<p>* float32 : 0.4 1.2 16.7