Let's represent the number 42,643,192, or 10100010101010111011111000₂, in different "floating point" representations.<p>Scientific notation with 5 significant figures:<p><pre><code> 4.2643 × 10⁷
</code></pre>
Scientific notation in base 2 with 17 significant binary figures:<p><pre><code> 1.0100010101010111₂ × 2²⁵
</code></pre>
Let's pack this in a fixed-length datatype. Note that 011001₂ is the binary encoding of 25.<p><pre><code> 1 0100010101010111 011001
1 mantissa exp.
</code></pre>
This doesn't suffice because<p>a. We're wasting a bit on the leading 1.<p>b. We want to support negative values.<p>c. We want to support negative exponents.<p>d. It would be nice if values of the same sign sorted by their representation.<p>The leading 1 can be dropped and replaced with a sign bit (0 for "+", 1 for "-"). The exponent can have 100000₂ subtracted from it, so 011001₂ represents 25-32, or -7, and 111001₂ represents 25. Sorting can be handled by putting the exponent before the mantissa.<p>Thus we get to a traditional floating point representation.<p><pre><code> 0 111001 0100010101010111
± exp. mantissa
</code></pre>
Real floating point has a little more on top (infinities, standardised field sizes, etc.) but is fundamentally the same.
Best simple explanation I've seen so far.<p>I highly recommend Fabien's Game Engine Black Book. I'm halfway through it, and it's really fun. I've only been a software dev for 6 years, so looking at how things could be hacked around in the 90s to squeeze every drop of performance out of very constrained devices is fascinating.
> People who really wanted an hardware floating point unit in 1991 could buy one. The only people who could possibly want one back then would have been scientists (as per Intel understanding of the market). They were marketed as "Math CoProcessor". Performance were average and price was outrageous (200 USD in 1993 equivalent to 350 USD in 2016.). As a result, sales were mediocre.<p>Actually, that's only <i>partly</i> true. My father owned a company that outfitted large manufacturing shops (MI company, you can imagine who his customers were). As a result, he used AutoCAD. The version of AutoCAD he used had a hard requirement on the so-called "Math Co-processor", so he ended up having to purchase one and install it himself. That was my first taste of taking a computer apart and upgrading it and I credit that small move with my becoming interested in building PCs, which led to my dad and I starting a business in the 90s doing that for individuals and businesses. There were definitely more reasons for that kind of add-on than just scientific fields; anyone in the computer aided drafting world at that time needed one as well.
Okay, fine, I agree that sometimes mathematical notation is bad and we are all computer people here, not math people, so we get really scared of mathematical notation.<p>But is (-1)^S 1.M 2^(E-127) so bad that it required a whole blog post to explain it? Except for the "1.M" pseudo-notation to explain the mantissa with the implicit on bit, all of those symbols are found in most programming languages we use.<p>I don't think the value of the blog post was explaining the notation. We all knew what operations to perform when we saw it. The value seems to lie more in thinking of the exponent as the offset on the real line and the mantissa as a certain window inside that offset.<p>Personally, though, this still doesn't seem like a huge, deep insight to me, but maybe I'm just way too used to floating point and have forgotten how hard it was to learn this. I did learn about mantissa, exponents, and even learned how to use a log table in high school, but maybe I'm just old and had an unusual high school experience.
Here's everything you need to know about Floating Point in as shortly as I can write it.<p>1. Floating points are simply "Binary Scientific notation". The speed of light is 2.98E8... which in "normal form" is written 298,000,000. An IEEE 754 Single has 8-bits for the exponent (E8 in the speed of light), and 24-bits for the mantissa (the 2.98 part). There's some complicated stuff like offset shifting here, but this is the "core idea" of floating point.<p>2. "Rounding" is forced to happen in Floating Point whenever "information drops off" the far side of the mantissa. The mantissa is only 24-bits long, and many numbers (such as .1) require an infinite number of bits to represent! As such, this "rounding error" builds up <i>exponentially</i> the more operations you perform.<p>3. Subtraction (cancellation error) is the biggest single source of error and the one that needs to be most studied. "Subtraction" can occur when a positive and negative number is added together.<p>4. Because of this error (and all errors!), Floating point operations are NOT associative. (A + B) + C gives a different value than A + (B + C). The commutative property remains for multiplication and addition (A+B == B+A). If you require "bit-perfect" and consistent floating-point simulations, you MUST take into account the order of all operations, even simple addition and multiplication.<p>For example: Try "0.1 + 0.7 + 1" vs "1 + 0.1 + .7" in Python, and you'll see that these to orderings lead to different results.<p>---------------<p>Once you fully know and understand these 4 facts, then everything else is just icing on the cake. For example to prevent "cancellation error" (#3), you can sort the numbers by magnitude, and then add them up from smallest magnitude to largest magnitude.
I like the "window/offset" concept. I wrote an extended blog article with yet different visual aids: <a href="http://blog.reverberate.org/2014/09/what-every-computer-programmer-should.html" rel="nofollow">http://blog.reverberate.org/2014/09/what-every-computer-prog...</a>
I really wish he didn't make [0,1] one of the windows, because in floating point arithmetic the range [0,1] contains approximately as many floating point numbers (a billion or so in Float32) as the range [1,∞). There are "windows" [2^k,2^(k+1)] for positive <i>as well as negative</i> k. Just creates unnecessary scope for further confusion.
IMO, the best way to explain floating point is to play with a tiny float. With an 8-bit float (1 bit sign, 4 bits exponent, 3 bits mantissa, exponent bias 7), there are only 256 possible values. One can write by hand a table with the corresponding value for each of the 256 possibilities, and get a feel to how it really works.<p>(I got the 1+4+3 from <a href="http://www.toves.org/books/float/" rel="nofollow">http://www.toves.org/books/float/</a>, I don't know if it's the best allocation for the bits; but for didactic purposes, it works.)
> Since floating point units were so slow, why did the C language end up with float and double types ? After all, the machine used to invent the language (PDP-11) did not have a floating point unit! The manufacturer (DEC) had promised to Dennis Ritchie and Ken Thompson the next model would have one. Being astronomy enthusiasts they decided to add those two types to their language.<p>Wait, what was the alternative? No floats? How the heck would people calculate things with only integers?<p>edit: AFAIK bignums are even slower, and fixed-point accumulates error like crazy
Imagine a ruler with all floating point values on it, each time the mantissa comes at its maximum, you increase the exponent, so the space between farther float values doubles.<p>The number of mantissa values being constant for each exponent value, the exponent describes some kind of "zoom level".<p>Float values on a ruler would sort of looks like this:<p><pre><code> ... x x x x x x x x x x...
^ exponent increases, spacings are doubled</code></pre>
I see how some people just get the math, but I don't see why programmers here say they find it difficult to understand the window / offset explanation the article gives.<p>A "window" is a common programming term for a range between two values.<p>An "offset" is a common term for where a value falls after a starting point.<p>In simpler decimal and equidistant terms, the idea is to split a range of values in windows, divide each window in N values, and store an FP number by storing which window and which index inside the window (0 to N) it falls.<p>The FP scheme actually uses powers of 2 instead of equal distant windows (so the granularity becomes coarser as the numbers become bigger) but the principle is the same.
I think the example values at <a href="https://en.wikipedia.org/wiki/Minifloat" rel="nofollow">https://en.wikipedia.org/wiki/Minifloat</a> are most useful for intuitively understanding how floating point works --- especially the "all values" table, which shows how the numbers are spaced by 1s, then 2s, then 4s, etc. meaning the same number of values can represent a larger range of magnitudes, but sacrificing precision in the process.
Obligatory: What Every Computer Scientist Should Know About
Floating-Point Arithmetic [1] (pdf)<p>[1]: <a href="http://www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf" rel="nofollow">http://www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf</a>
See also: An interactive floating point visualization: <a href="https://evanw.github.io/float-toy/" rel="nofollow">https://evanw.github.io/float-toy/</a>
As a complete layman with only a cursory knowledge of programming, as well as a complete lack of math skills above Algebra 2, (I didn't even complete that, tbh, once they threw graphing into the equation. I did get slope-intercept form down, but that's it.)<p>I ended up finding this easier to understand than I expected, and a great read.<p>I love explanations like these, with a visual breakdown. It really helps it "click."<p>as long as I glazed over the math formulas and didn't let the numbers overwhelm me.<p>This is what I took away: the exponent "reaches" out to the max value of the [0,1] [2,4] etc, and the number represented tends to be like 51-53% of the way down the line of the mantissa.<p>It "clicked" a bit for me, see? Am I way off?<p>This is the way I always learned math the best in school, an alternate explanation that helps it "click."<p>Very good explanation, from my point of view, of how floating point numbers work and what they even are.<p>That's a nice feeling for someone like me who is pretty bad at math and finds formulas like the one shown in the article to be, frankly, indecipherable.<p>But now I (sort of) understand how floating point numbers work, (sort of) what they are, why they are important, and what role they play.<p>Could I program anything using one? No. But, I could learn someday, and explanations like these give me some hope that I just might be able to learn a programming language if I put the effort in. That I could learn the math required of me, even!
Why did they fix the bit-width for the mantissa and exponent? It would be nice to have more bits for the mantissa when you are near 1, and then ignore the mantissa entirely when you're dealing with enormous exponents, and very far from one. Granted, there would be some overhead (e.g. a 3-bit field describing the exponent length, or something) but it would be a useful data-structure.
> Instead of Exponent, think of a Window between two consecutive power of two integers.<p>I know what an exponent is, or if you want "order of magnitude". Sorry, but "A window between two consecutive power of two integers" doesn't make it easier to think about.
The last bits of trivia are very nice.<p>The x87 coprocessor makes me wonder about days were each chip changed you system. It was such a different mindset that videogame consoles had parallel routes between the board and the cartridge themselves to allow hardware extension per game.
>I wanted to vividly demonstrate how much of a handicap it was to work without floating points.<p>So, did he manage to demonstrate that in the book? Because the page linked here, while explaining how floating points are represented in memory, does not explain how computers perform operations on them, or what purpose does a FPU serve (how does it differ from an ALU).
Off topic:
The response of dragontamer is one example about why down votes alone are not enough. It was downvoted to the dead level and now nobody can reply to it. But also nobody gave the reason why what he was saying is incorrect.
I was hoping for something akin to a xkcd or SMBC comic, this isn't really much better than what my asm prof said when he explained it for MIPS programming. Maybe it's because I don't get what he means by offset and window, but this wasn't really that helpful.