I'm not on Mastodon, so I'll share here: I inherited some numerical software that was used primarily to prototype new algorithms and check errors for a hardware product that solved the same problem. It was known that different versions of the software produced slightly different answers, for seemingly no reason. The hardware engineer who handed it off to me didn't seem to be bothered by it. He wasn't using version control, so I couldn't dig into it immediately, but I couldn't stop thinking about it.<p>Soon enough I had two consecutive releases in hand, which produced different results, and which had <i>identical numerical code</i>. The only code I had changed that ran during the numerical calculations was code that ran <i>between</i> iterations of the numerical parts of the code. IIRC, it printed out some status information like how long it had been running, how many calculations it had done, the percent completed, and the predicted time remaining.<p>How could that be affecting the numerical calculations??? My first thought was a memory bug (the code was in C-flavored C++, with manual memory management) but I got nowhere looking for one. Unfortunately, I don't remember the process by which I figured out the answer, but at some point I wondered what instructions were used to do the floating-point calculations. The Makefile didn't specify any architecture at all, and for that compiler, on that architecture, that meant using x87 floating-point instructions.<p>The x87 instruction set was originally created for floating point coprocessors that were designed to work in tandem with Intel CPUs. The 8087 coprocessor worked with the 8086, the 287 with the 286, the 387 with the 386. Starting with the 486 generation, the implementation was moved into the CPU.<p>Crucially, the x87 instruction set includes a stack of eight 80-bit registers. Your C code may specify 64-bit floating point numbers, but since the compiled code has to copy those value into the x87 registers to execute floating-point instructions, the calculations are done with 80-bit precision. Then the values are copied back into 64-bit registers. If you are doing multiple calculations, a smart compiler will keep intermediate values in the 80-bit registers, saving cycles and gaining a little bit of precision as a bonus.<p>Of course, the number of registers is limited, so intermediate values may need to be copied to a 64-bit register temporarily to make room for another calculation to happen, rounding them in the process. And that's how code interleaved with numerical calculations can affected the results even if it semantically doesn't change any of the values. Calculating percent completed, printing a progress bar -- the compiler may need to move values out of the 80-bit registers to make room for these calculations, and when the code changes (like you decide to also print out an estimated time remaining) the compiler might change which intermediate values are bumped out of the 80-bit registers and rounded to 64 bits.<p>It was silly that we were executing these ancient instructions in 2004 on Opteron workstations, which supported SSE2, so I added a compiler flag to enable SSE2 instructions, and voila, the numerical results matched exactly from build to build. We also got a considerable speedup. I later found out that there's a bit you can flip to force x87 arithmetic to always round results to 64 bits, probably to solve exactly the problem I encountered, but I never circled back to try it.