It isn't just badly done math loops that can cripple performance. Years ago, some users were complaining that it was taking forever to load their data into their analysis program. It turned out they were reading thousands of structs, <i>one element at a time</i> with the Unix read(2) <i>system call</i>! I taught them about buffering and the read time went down by a factor of ten or more, I forget the exact numbers.