Haven't gone through the code, but measurement methodology seems wrong to me.<p>> As you can see, the disk I/O in the simple Go version takes only 14% of the running time. In the optimized version, we’ve sped up both reading and processing, and the disk I/O takes only 7% of the total.<p>1. If I/O wasn't a bottleneck, shouldn't we optimize only reading to have comparable benchmarks?<p>2. Imagine program was running 100 sec, (14% I/O) so 14 seconds are spent on I/O. Now we optimize processing and total time became 70 seconds, if I/O wasn't a bottleneck, and we haven't optimized I/O, total disk I/O should become 20% of total execution time, not 7%.<p>Disk I/O:<p>> Go simple (0.499), Go optimized (0.154)<p>clearly, I/O access was optimized 3x and total execution was optimized 1.6x times. This is not a good way of measurement to say I/O is not a bottleneck.<p>I agree though things are getting faster.