NaN-boxing is morally equivalent to using tagged unions to represent values of any type. It optimizes for fast floating-point operations, and for allowing a larger amount of meaningful values to be represented in 64 bits.<p>Most tagged-union approaches use some sort of punning so that at least one type of value is directly meaningful, without any bit manipulation. For example, some Common Lisp implementations will choose their tag bits for Cons cells such that NIL is represented with all zeros. Since NIL is the only falsey value and everything else is truthy, this means that any bit-pattern can be used directly for testing conditionals, with the correct semantics.<p>NaN-boxing is choosing a different type of punning for the tags, to optimize a different use case. First, all bit-patterns can be used directly as 64-bit floats with technically correct semantics--anything that's NaN-boxed is actually not a floating-point number. I'm given to understand that modern architectures are slow to mix float and bit ops, so it's nice that you don't need to mask your tags off of your float before operating on them.<p>Second, (double-precision) floats are usually the only type that require the full 64 bits to have meaning. In many applications, 50ish bits is 'good enough' for ints and pointers (with some extra ops to handle overflow), but floats are mandated by a standard and don't scale down gracefully. A tagged union wouldn't be able to contain a 64-bit float directly without spilling into an extra machine word. Unless, of course, the tags are punned to be part of the float value itself. And that's exactly what NaN-boxing is.
In Racket, they use the alignment of the memory blocks:<p>* The even integer numbers are pointers.<p>* The odd integers represent fixnums (small integers) (n <---> 2*n+1).<p>With this representation the calculations with fixnums are quite fast because they don't need unboxing.
<i>I'm not really sure I understood the utility of this technique, that I see as an hack (it relies on the hardware not caring on the value of the mantissa in a NaN) but coming from a Java background I'm not used to the roughness of C.</i><p>The use of "signalling NaNs" is supported by IEEE 754. It may be a hack, but it's a hack that's in the standard.
luajit was the first implementation I've seen, but it's author Mike Pall said soemewhere that this was an old technique, or more like that it could've been known before. I remember reading aome lisp paper from th 90s where sifferent encoding techniques/boxings were mentioned, and one of them was it (I thnink).
v8 doesn't use this technique - the use SMI.<p>Read also <a href="http://wingolog.org/archives/2011/05/18/value-representation-in-javascript-implementations" rel="nofollow">http://wingolog.org/archives/2011/05/18/value-representation...</a>