Consistency: How to defeat the purpose of IEEE floating point (2008)

38 点作者 aw1621107大约 5 年前

8 条评论

acqq大约 5 年前

> 99.99% of the code snippets in that realm work great with 64b floating point, without the author having invested any thought at all into "numerical analysis"In the accident that they "work great" (really, great?) it's only because:- The code depends on properties of IEEE FP which were designed exactly so that it's harder for a casual user to shoot himself in the foot -- ant these properties were intentionally designed for IEEE FP by the people who DID invest a lot in "numerical analysis" and the practical consequences of potential bad decisions.- The code depends on libraries that were designed with much more effort than the author of the above statement can imagine.In short, yes, we do need all features of IEEE FP. And to produce anything non-trivial one should indeed learn more about all that, and care.> Summary: use SSE2 or SSE, and if you can't, configure the FP CSR to use 64b intermediates and avoid 32b floats. Even the latter solution works passably in practice, as long as everybody is aware of it.That was, and I guess it hasn't changed, the default with Microsoft's compilers on Windows for decades already, and probably sensible default for non-Microsoft scenarios, especially needed for the "consistency" across the compilers, which matches the title of the article. Oh, and make sure that the compiler doesn't do any optimization that produces unstable results.That's about the "production" default. However, I still believe that during the development of anything non-trivial the evaluation of the results using different numbers of bits is worth doing.

radford-neal大约 5 年前

One additional problem is that IEEE floating point fails to require that addition and multiplication be commutative."WHAT?", you say? Surely it has to be commutative!Well, it is, except in cases where both operands are "NaN" (Not a Number). You see, there's not just one NaN, but many, with different "payloads", intended to indicate the source of the error leading to a NaN. The payload gets propagated through arithmetic. But what happens when both operands are NaN, with different payloads? The standard says that the result is one or the other of these NaNs, but leaves unspecified which.The old Intel FPU chose the NaN with the larger payload, which gives results independent of the operand order. But SSE uses the payload from the first operand. And so we get non-commutative addition and multiplication.The compilers, of course, assume these operations are commutative, so the results are completely arbitrary.One practical effect: In R, missing data - NA - is implemented as a NaN with a particular payload. So in R, if you write something like NA+sqrt(-1), you arbitrarily get either NA or NaN as the result, and you probably get the opposite for sqrt(-1)+NA. And both might vary depending on the context in which the computation occurs (eg, in vector arithmetic or not).

onetoo大约 5 年前

This is also an issue in video game programming, where this lack of consistency causes issues in the implementation of replays or lockstep networking. The core idea of both is to store/share the inputs for each frame, such that the game's state can be derived from them. Even small inconsistencies every frame can explode in size due to the sheer amount of frames.If you think this article is interesting, you may also be interested in learning about posits.They are an alternative to floats with better precision near 0, which, the authors claim, makes them superior for things like machine learning. Relevant to this article is the fact that they are defined to be consistent, so if they become popular this will never be an issue again.Here is an article from the authors of posit which explains its advantages. <a href="http://www.johngustafson.net/pdfs/BeatingFloatingPoint.pdf" rel="nofollow">http://www.johngustafson.net/pdfs/BeatingFloatingPoint.pdf</a>Here is a more nuanced look at posits, which explains its disadvantages. <a href="https://hal.inria.fr/hal-01959581v3/document" rel="nofollow">https://hal.inria.fr/hal-01959581v3/document</a>

saagarjha大约 5 年前

> Compilers, or more specifically buggy optimization passes, assume that floating point numbers can be treated as a field – you know, associativity, distributivity, the works.Of course, this largely depends on how "YOLO" your compiler is. I believe GCC and Clang try reasonably hard to follow IEEE 754, which ICC is much more lax.

评论 #22495619 未加载

评论 #22493218 未加载

评论 #22493465 未加载

jstewartmobile大约 5 年前

Many of the older, business-driven mainframe designs have hardware BCD instructions--for faithful/performant implementation of grade-school-style dollars-and-cents base-10 arithmetic.On the other hand, a great deal of PC evolution has been driven by games--where performance is king. Hard to beat IEEE floating point on performance & storage efficiency!Then there are the rusty sharp edges of x86, but that is life...I wonder if `-O0` would solve the inconsistency? I don't particularly trust many compiler optimizations--too much temptation for a compiler writer to go performance-crazy, and start treating this computer voodoo like it was actual algebra.

评论 #22493086 未加载

seanalltogether大约 5 年前

How well does floating point work for 3D games/programs and gpus? That seems to be a very large category of floating point usage but I have no knowledge on whether it works well in that space. Would gpus be x% faster if they didn't have to do floating point, would games have more or less rendering problems without floating point?

评论 #22492509 未加载

评论 #22492767 未加载

评论 #22493562 未加载

PaulHoule大约 5 年前

Numeric pros are not that happy w/ IEEE numbers. The main intellectual effort involved was that Intel had some freshers make a floating point coprocessor, then the standard just documented what the chip did.

评论 #22493825 未加载

评论 #22493876 未加载

kstenerud大约 5 年前

The problem is that instructions for ieee754 values use the full precision of those values (or greater), when you almost never need that much. And if you leave them as-is, you build up bias.As your calculations progress, your results slowly build up significant digit bias (which will be different depending on the architecture and libraries). To get around this, you'd have to round regularly, but that also slows things down (and is difficult to do in binary float).If you're taking the results of calculations at their full precision, you're just asking for trouble. 32-bit binary ieee754 may be able to represent 7 digits of precision, but I sure as hell wouldn't take the results of 32-bit float operations to more than 6!The alternative is to get a contract from the compiler that everything will be done in the same precision with the same bias for the specified type, and just accept the buildup (which we're currently doing without that guarantee, and getting burned by it).