The title is actually wrong - the -O0 version is correct, the -O3 version is not (despite giving the output the author expected).<p>Casting the value to double ends up converting the long value 0x7fffffffffffffff to the nearest double value: 0x8000000000000000. As the -O0 version CORRECTLY reports, this does not round-trip back to the same value in the "long" type. Many other values, though not all, down to about 1/1024 of that value (1 / 2^(63-53)) will also fail to round-trip for similar reasons.<p>Unless my coffee-deficient brain is missing something at the moment, it should be the case that any integer with 53 bits or fewer between the first and last 1 bit (inclusive) will roundtrip cleanly. Any other integer will not.<p>Edit: fixed a typo above, and coded up the idea I expected to work and ran it through quickcheck for a few min, and this version seems to be correct ('int' return rather than bool is just because haskell's ffi doesn't natively support C99 bool):<p><pre><code> #include <limits.h>
int fits(long x) {
if (x == LONG_MIN) return 0;
unsigned long ux = x < 0 ? -x : x;
while (ux > 0x1fffffffffffffUL && !(ux & 1)) {
ux /= 2;
}
return ux <= 0x1fffffffffffffUL;
}</code></pre>
Hoo boy. This is what happens the first time C programmers start to work with floating point and don't know the fundamentals.<p>When you work with floating point, you need to remember you work with a tolerance to epsilon for comparisons because you are rounding to 1/n^2 precision and different floating point units perform the conversion in different ways.<p>You must abandon the idea of '==' for floats.<p>This is why his code is unpredictable, because you cannot guarantee the conversion of any integer to and from float is the same number. Period. The LSBs of the mantissa can and do change, which is why we mask to a precision or use signal-to-noise comparisons when evaluation bit drift between FP computations.<p>He has the first part correct, < and > are your friend with FP. But to get past the '==' hurdle, he needs to define his tolerance, the code should be something like:<p>if (fabs(f1 - f2) > TOLERANCE) ... fits = true.<p>I was irked by his arrogance when he asks, "Intel CPUs have a history of bugs. Did I hit one of those?" First, learn about floating point, then, work on an FPUnit team for 10 years, and even then, don't assume you're smarter than a team of floating point architects, you're not.
This may be an instance of "you should really know the gcc/clang sanitizers and use them to test your code":<p>clang test.c -O0 -fsanitize=undefined<p>./a.out<p>[...]<p>test.c:17:12: runtime error: 9.22337e+18 is outside the range of representable values of type 'long'<p>Interestingly gcc doesn't throw that warning.
A precise integer value is only guaranteed to be representable losslessly in a double if it is up to `64 - 1 (sign) - 11 (exponent) = 52` bits in magnitude.<p>This should be fairly obvious with knowledge about how floating point numbers are represented internally IMO.<p>Edit: Be more precise about what can be represented.
Clear your floating point exception register by calling feclearexcept(FE_ALL_EXCEPT). Convert to long by calling lrint(rint(x)). Then check your exception register using fetestexcept(). FE_INEXACT will indicate that the input wasn't an integer, and FE_INVALID will indicate that the result doesn't fit in a long.<p>Edit: check for me whether just calling lrint(x) works. The manpage doesn't specify that lrint() will set FE_INEXACT, but it seems weird to me that it wouldn't.
To save some time for new readers, the author is unfamiliar with floating point representation and thinks that a double precision number since it is 64 bits can hold any 64 bit integer and is somewhat confused as to what an xmm register can hold(they believe that it has 128 bits of precision instead of being able to hold 2 64-bit doubles, or 4 32-bit singles). They attempt to find the issue a few ways. The correct solution is not to convert any integer larger than 2^53 in absolute value since only integers that large can be successfully converted to double and back(
aside from a few others that exist sparsely).
> When something very basic goes wrong, I have this hierarchy of potential culprits:<p>I don't know if this is supposed to be a joke or part of the setup for an explanatory post about undefined behaviour, but that list is in exactly the wrong order.
SSE xmm registers might be 128 bits wide, but the precision is still 64 bits. The additional (high) bits are zeroed out.<p>What you're seeing is not excess precision due to wide registers but excess precision due to optimization and constant propagation, which means GCC calculates a fast path for (argc == 1) that doesn't round correctly and ends up with "it fits".<p>Interestingly it does optimize to the correct "doesn't fit" with -mfpmath=387 -fexcess-precision=standard, so I guess this is a bug in how GCC treats SSE math. The sanitizer (-fsanitize=float-cast-overflow) also notices the problem.
With the help of your comments, I could now write the conclusion to my article. In a nutshell this is the solution:<p><pre><code> #include <math.h>
#include <fenv.h>
int fits_long( double d)
{
long l_val;
double d_val;
// may be needed ?
// #pragma STDC FENV_ACCESS ON
feclearexcept( FE_INVALID);
l_val = lrint( d);
d_val = (double) l_val;
if( fetestexcept( FE_INVALID))
return( 0);
return( d_val == d);
}
</code></pre>
The article explains it in more detail. Thanks for the help.
The frst rule of floating point comparison is you do not compare them for equality, but instead calculate the difference and check if the difference is less than epsilon.
>I am still looking for a better way to check, if a double will convert cleanly to an integer of the same size or not.<p>I'd say the cleanest would be to decode exponent and mantissa, check if the exponent is within the 64-bit limit of long, then check if there's any bits set below the decimal point. (+plus some extra care for two's complement negative numbers)<p>The problem with this is of course that this would be platform dependent.
The most surprising thing for me out of this is that casting a high positive integer to double will output the <i>nearest</i> double which could be higher, not the highest one smaller than or equal to the integer value.<p>Is there a way to get the largest double smaller or equal than some positive integer?
> When something very basic goes wrong, I have this hierarchy of potential culprits:
the compiler
buggy hardware
OS vendor
last and least me, because I don’t make mistakes :)<p>I really dislike the arrogant programmer trope. Can we all stop?
How did you decide that "the method works for LONG_MIN"? Did the method return the expected output of false? Because it really seems like the code is working correctly on `-O0` and incorrectly on `-O3`...