There are only four billion floats, so test them all (2014)

501 pointsby tape_measureover 4 years ago

31 comments

tialaramexover 4 years ago

Many years ago now we built something that used a 60-bit truncated hash of URIs. So that's far too small to be comfortable that you won't run into collisions, but it's big enough that for modest sizes (hundreds of millions up to billions) a collision is unlikely. So you can have a slow path for the collision case, so long as that slow path is correct. For unit testing we needed at least one collision so we could see that everything behaves as designed before some real user hits one. We had test data sets with over a billion URIs in them, but none that (as far as we could tell) had a collision.So I just wrote code to mint nonsense URIs for our system based on a small integer seed (like <a href="https://foo.example/123456789" rel="nofollow">https://foo.example/123456789</a>), then span up a system with a bunch of RAM (maybe 8-12GB) to iterate through each seed, storing the hash and seed it was generated from, until it hit a collision after a few gigabytes of RAM, then it spat out the two URIs from the colliding seeds. Bingo, now we had data for unit testing to show that stuff works even in the unlikely case that two distinct URIs have the same hash.Probably a day's work to build that, and then I'd guess several hours every month saved debugging weird collision bugs because our unit tests would now tell us if we'd screwed up and collisions were slipping past.

评论 #24750305 未加载

评论 #24747491 未加载

评论 #24746143 未加载

评论 #24747871 未加载

评论 #24747801 未加载

评论 #24747036 未加载

评论 #24747554 未加载

评论 #24746286 未加载

dragontamerover 4 years ago

Things have gotten even faster since then.* Target: 64-bit space (2^64).* GPUs are more flexible and faster. A Vega64 (a few years old now) has 4096 shaders ("CudaThreads") at 1.6 GHz. It can search the entire 32-bit floating point (or integer) space in less than a second. 1.6 GHz x 4096 shaders == 2^42.5 bitspace per second.* People exhaustively searching a space are relatively patient: and willing to wait months for the results. 60 seconds x 60 minutes x 24 hours x 30 days == 2^21.31.6 GHz x 4096 shaders x 60 seconds x 60 minutes x 24 hours x 30 days is roughly equal to 2^63.8. We've done it: a month of GPU effort is all you need to exhaustively search the 64-bit space-------We've entered a world (~3 years ago), where it is reasonable to brute force the 64-bit space, so long as you're willing to wait about a month for the results.If you're not willing to wait a month, then use 2 GPUs, or 4 GPUs, or a cluster of GPUs like the DGX A100 (or multiple clusters).

评论 #24746642 未加载

评论 #24746685 未加载

评论 #24746497 未加载

ChuckMcMover 4 years ago

I love this.This is another in the "Now that we have computers with gazillions of bits of RAM and many cores and gazillions of cycles, what conventional wisdom is no longer wise?" series :-).I suggested to a client that they use Sqlite3 for their company database rather than a disk based database. And they looked at me shocked, as if I had spoken heresy. I pointed out how many records they could store on a 64GB machine and that is a small data center class machine and it sort of boggled their mind. Add read-only copies, shared from a NAS device gave them perfect scalability in terms of queries per second (their sales folks had their own database that they were updating). It was fun to watch their expression go from horror to wonder to excitement when they finally implemented it.

评论 #24747686 未加载

评论 #24747943 未加载

评论 #24747854 未加载

评论 #24747895 未加载

ascarover 4 years ago

Previous submission 2017:<a href="https://news.ycombinator.com/item?id=15252426" rel="nofollow">https://news.ycombinator.com/item?id=15252426</a>Can't believe this was already 3 years ago. Interesting read!Previous submission 2014: <a href="https://news.ycombinator.com/item?id=7135261" rel="nofollow">https://news.ycombinator.com/item?id=7135261</a>

评论 #24747466 未加载

xchaoticover 4 years ago

Articles like this make me wonder how everything even works if basic math functions can be so wrong and how a simple concept like floor can have different definitions/ implementations

评论 #24746044 未加载

评论 #24746674 未加载

评论 #24745983 未加载

评论 #24746071 未加载

评论 #24747411 未加载

jlebarover 4 years ago

We did this in XLA. It runs as part of CI, which I always found pretty amazing. <a href="https://github.com/tensorflow/tensorflow/blob/bd2608178bca1be95007328fb9e01408a94ba082/tensorflow/compiler/xla/tests/exhaustive_op_test_utils.cc" rel="nofollow">https://github.com/tensorflow/tensorflow/blob/bd2608178bca1b...</a>Found a lot of bugs this way.

sega_saiover 4 years ago

It is all very nice, but if your function has two floating point parameters you have 1.6e19 tests to do...

评论 #24745934 未加载

评论 #24746285 未加载

评论 #24746894 未加载

评论 #24745876 未加载

评论 #24746163 未加载

hinkleyover 4 years ago

I decided to build a scatter plot to demonstrate what I was 95% sure was a data loss bug (preproduction, thankfully) in the initializer for a CSPRNG.I figured plotting a few bytes if output (256x256) would do nicely. I plotted all possible two byte inputs just to make sure it was working, and that ran instantly, so I tried 3 bytes and that was still pretty fast, so I did 4 while I was at lunch or in a meeting.Sure enough, big black splotches from dropping some of the seed data on the floor. Please to be fixing.Old farts know a lot of things have been tried before and how those worked out. But we also think doing a complex operation 4 billion times is going to be irretrievably slow. Always test your assumptions.

zarothover 4 years ago

> For very small numbers (less than about FLT_EPSILON * 0.25) adding 0.5 gives 0.5 exactly, and this then rounds to zero. Since about 40% of the positive floating-point numbers are smaller than FLT_EPSILON * 0.25 this results in a lot of errors – over 850 million of them!I thought Float Epsilon was the smallest possible number which could be represented by a float? This statement doesn’t seem to make sense...

评论 #24746514 未加载

评论 #24746114 未加载

评论 #24746081 未加载

评论 #24746120 未加载

评论 #24751866 未加载

uytover 4 years ago

I was reading about an implementation of Barrett reduction recently:<a href="https://en.wikipedia.org/wiki/Barrett_reduction" rel="nofollow">https://en.wikipedia.org/wiki/Barrett_reduction</a>Specifically one that uses 80-bit long double for calculating `(__int128_t)a * b % M` from this github:<a href="https://github.com/kth-competitive-programming/kactl/blob/master/content/number-theory/ModMulLL.h" rel="nofollow">https://github.com/kth-competitive-programming/kactl/blob/ma...</a>But how do you test the correctness of such code? You can't just stress test all possible product of 64 bit numbers!In the end you still gotta go back to good old fashion proofs: <a href="https://github.com/kth-competitive-programming/kactl/blob/master/doc/modmul-proof.pdf" rel="nofollow">https://github.com/kth-competitive-programming/kactl/blob/ma...</a>

ViralBShahover 4 years ago

We've certainly tested all Float32 inputs to some functions in Julia's math packages. I'm unable to find the relevant issues this very moment, otherwise would certainly post them.

评论 #24747910 未加载

fallingmeatover 4 years ago

maybe I dont understand, but is this article only trying to suggest exhaustive testing for single precision, single input functions? obviously this approach is not practical as soon as there are multiple inputs or wider precision.

评论 #24745913 未加载

评论 #24746024 未加载

评论 #24746370 未加载

评论 #24747266 未加载

Const-meover 4 years ago

In addition to many correctness issues, that code is relatively slow.SSE4.1 introduced roundps instruction that's both fast and correct. Unlike AVX2 or FMA3, SSE4.1 support is almost universal by now. In most of my current projects, I check support on startup and refuse to run if SSE4.1 ain't supported. Just don't forget to set the corresponding macros for DirectXMath, Eigen, etc.

elvis70over 4 years ago

"When in doubt, use brute force" -- Ken Thompson

JoeAltmaierover 4 years ago

My god this issue has been around for too long. In the '80s testing folk learned to shotgun the argument space to test functions, and they've not gone an inch forward since! Or so it seems.A version of the Pentium was released that couldn't divide by 10. And even back then, testing all the mantissas on a simulator would have taken only hours. Instead, they masked, marketed and released a tragically flawed processor.Just test the input space exhaustively, folks! It's a computer; it feels no pain. Work it hard until it fails or works perfectly.

评论 #24751026 未加载

评论 #24749207 未加载

wtnover 4 years ago

In the early 1980s, truncated floating point calculations caused the Vancouver Stock Exchange index value to drift down for about 23 months, until it was shown as less than half the correct value: <a href="https://en.wikipedia.org/wiki/Vancouver_Stock_Exchange#Rounding_errors_on_its_Index_price" rel="nofollow">https://en.wikipedia.org/wiki/Vancouver_Stock_Exchange#Round...</a>

throwaway7281over 4 years ago

I was once in the position to test 3M possible inputs - that took only a few minutes. It's a good strategy to keep in mind.

vorticoover 4 years ago

Does anyone have code to reduce the search space by a few orders of magnitude to test functions with a float64 argument or two or three float32 arguments in a few seconds? Perhaps by skipping "boring" floats that aren't near endpoints that are unlikely to produce errors that more interesting floats do?

评论 #24749995 未加载

评论 #24747277 未加载

评论 #24747055 未加载

评论 #24749270 未加载

评论 #24746868 未加载

jansanover 4 years ago

Javascript guy here, can you help me set up that Beowulf cluster so I can start testing my 2^64 cases?

peter303over 4 years ago

I recall hackers building standard password hashes of all eight character ascii combinations. Then create a reverse lookup table. But thats 400 trillion entries for stand 68 character ascii password subset.400 trillion hashes is not a big computation in the blockchain era.

评论 #24748942 未加载

lentil_soupover 4 years ago

"Round to nearest even is the default IEEE rounding mode. This means that 5.5 rounds to 6, and 6.5 also rounds to 6"Does anyone know the reason behind this? why the nearest even and not a hard cut like "always round up with >= 0.5"?

taghover 4 years ago

Makes sense when you're refactoring an existing function with one numeric argument, and which evaluates rapidly.

zzzeekover 4 years ago

Crap now I have 67 million failures to fix.

IncRndover 4 years ago

> I’ve written in great detail about how to compare floating-point values using an epsilon, but there are times when it is just not appropriate. Sometimes there really is an answer that is correct, and in those cases anything less than perfection is just sloppy.Certain applications will choose fixed point to avoid what appears to be the imprecisions of floating point.

MeteorMarcover 4 years ago

For python you cannot user the stdlib for this, but it would work for numpy float32.

CGamesPlayover 4 years ago

1. Given the thoroughness of the write-up, I'm going to assume that the author did not find that any of the discrepancies in the implementations were actually bugs in the reference implementation. He goes as far as to say that he reimplemented the reference implantation so this seems like a safe assumption. Still, it would be nice to confirm. Given that other implementations seem so shoddy, it wouldn't be surprising to find a bug in the reference.2. "Conventional wisdom Nazis" Maybe the author was just bored of writing at this point, but some justification for why bitwise comparing floats was a correct decision would have been nice. Is it because there's no math involving unrepresentable intermediates? Because there's no integration happening, so no way for errors to accumulate?

评论 #24751400 未加载

zelphirkaltover 4 years ago

Test them all and waste a huge amount of energy and computational resources!I hope people test like that for really critical stuff, but not for your everyday library for your personal startup or e-business. Not even the big players need this.

jartover 4 years ago

"Seymour Cray didn't care that 81.0/3.0 did not give exactly 27.0 on the CDC 6000 class machines; and he was universally respected for making the fastest machines around." ──Linus Torvalds

NelsonMinarover 4 years ago

"Several of the ceil functions were implemented by adding 0.5 to the input value and rounding to nearest."What sort of clown would make that mistake and yet think they are capable of writing a floating point library?

评论 #24747077 未加载

knownover 4 years ago

Very difficult due to rounding errors;

alkonautover 4 years ago

How do you do this for a function with more than one or two inputs? If the domain is 4B to the power five, you are out of luck. Very few of us need to implement "floor" or any other function that takes 1 float input.