Nearly all binary search and merge sort implementations are broken (2006)

148 pointsby artoover 13 years ago

16 comments

DarkShikariover 13 years ago

It is not sufficient merely to prove a program correct; you have to test it too. Moreover, to be really certain that a program is correct, you have to test it for all possible input values, but this is seldom feasible.This statement is tantamount to saying "you don't merely need to prove Fermat's Last Theorem, you also have to test it for all possible input values". By this line of reasoning, most of mathematics should be thrown out.If you've proven a program correct, it is correct for all input values, whether you have tested them or not. If your proof ignores complexities such as the range of data values, it isn't a correct proof to begin with. And yes, proving programs correct is often possible -- in fact, the highest levels of software verification actually require explicit proofs of correctness for much of the code; see <a href="http://en.wikipedia.org/wiki/Evaluation_Assurance_Level#EAL7:_Formally_Verified_Design_and_Tested" rel="nofollow">http://en.wikipedia.org/wiki/Evaluation_Assurance_Level#EAL7...</a>

评论 #3530476 未加载

评论 #3530461 未加载

评论 #3530531 未加载

评论 #3530440 未加载

评论 #3531171 未加载

ColinWrightover 13 years ago

Previous, extensive discussion: <a href="http://news.ycombinator.com/item?id=1130463" rel="nofollow">http://news.ycombinator.com/item?id=1130463</a>I once ran a programming challenge based on this, but I discontinued when I started to get threats of physical violence from people whose code didn't pass the tests.The Dictionary of Algorithms and Data Structures on the NIST site has a correction I submitted on exactly this point:<a href="http://xlinux.nist.gov/dads/HTML/binarySearch.html" rel="nofollow">http://xlinux.nist.gov/dads/HTML/binarySearch.html</a>And my blog post: <a href="http://www.solipsys.co.uk/new/BinarySearchReconsidered.html?HN" rel="nofollow">http://www.solipsys.co.uk/new/BinarySearchReconsidered.html?...</a>

jbarhamover 13 years ago

Nice to see that the Go gets it right:<pre><code> func Search(n int, f func(int) bool) int { // Define f(-1) == false and f(n) == true. // Invariant: f(i-1) == false, f(j) == true. i, j := 0, n for i < j { h := i + (j-i)/2 // avoid overflow when computing h // i ≤ h < j if !f(h) { i = h + 1 // preserves f(i-1) == false } else { j = h // preserves f(j) == true } } // i == j, f(i-1) == false, and f(j) (= f(i)) == true => answer is i. return i } </code></pre> <a href="http://code.google.com/p/go/source/browse/src/pkg/sort/search.go?name=weekly.2012-01-27#57" rel="nofollow">http://code.google.com/p/go/source/browse/src/pkg/sort/searc...</a>

评论 #3531857 未加载

Wilduckover 13 years ago

An interesting article that links back to this one:<a href="http://reprog.wordpress.com/2010/04/19/are-you-one-of-the-10-percent/" rel="nofollow">http://reprog.wordpress.com/2010/04/19/are-you-one-of-the-10...</a>The claim is that even ignoring overflow, only 10% of programmers can correctly implement a binary search. When I tried it, I thought I got it working, but it was later pointed out that I didn't handle empty lists correctly.Programming correctly is hard.

评论 #3531770 未加载

lutormover 13 years ago

While true, it seems that if your array is that close to the inherent size limit in indexing by ints, this is at most a temporary fix. How fast does your sorting requirements go from 2^31 to 2^32? It seems that you should be using size_t here, not int.

评论 #3531288 未加载

评论 #3531535 未加载

pasbesoinover 13 years ago

Bypass the borked formatting (original is JS dependent; link below is to Google text-only cache):<a href="http://webcache.googleusercontent.com/search?strip=1&q=cache:http%3A%2F%2Fgoogleresearch.blogspot.com%2F2006%2F06%2Fextra-extra-read-all-about-it-nearly.html" rel="nofollow">http://webcache.googleusercontent.com/search?strip=1&q=c...</a>IIRC, this made the rounds a few years ago, in case it sounds familiar.

评论 #3530542 未加载

ChuckMcMover 13 years ago

I'm tempted to solve it with:<pre><code> low/2 + high/2 + (low & high & 1); </code></pre> That side steps the overflow condition completely.

评论 #3531125 未加载

pantaloonsover 13 years ago

I'm tempted to suggest that the fixed C/C++ version is still not correct with respect to this issue, namely on architectures where INT_MAX == UINT_MAX. (Or really INT_MAX <= UINT_MAX < 2 * INT_MAX, which is valid in C99 for INT_MAX >= 65535.)

评论 #3531141 未加载

yasonover 13 years ago

I don't like the index variables and splitting in half part. I like to write my binary searches with bit patterns:1) figure out the number of bits required to cover the largest index (= a little bit of cheap bit-twiddling)3) initialize your index to 02) flip a bit on in index, starting from the highest bit available from step 1)3) if data[index] is smaller than what you're looking for, leave the bit on4) try with the next-lowest bit and loop to 2) until at bit #0It's pretty easy to write it out in a way that's just obvious, instead of requiring the reader to wrap his mind about which variable is the lowest and highest bound and whether the indexes are inclusive/exclusive in which ends, etc.

评论 #3532705 未加载

评论 #3532847 未加载

tbrownawover 13 years ago

This bug requires either (1) low/mid/high are pointers [edit - I don't think this is valid; IIRC pointer arithmetic is only sane for add/subtract an int from a pointer, not divide a pointer or add two pointers]; or (2) item_count > INT_MAX / 2, which means memory_size >= sizeof(your_array) == sizeof(item) * item_count > INT_MAX / 2, which means either 1-byte or 2-byte array elements, or that your ints are smaller than your pointers. So is it really "nearly all"?

评论 #3530458 未加载

评论 #3530454 未加载

MarkSweepover 13 years ago

I don't understand why I need Javscript enabled to merely read a blog post on blogspots.

评论 #3533452 未加载

jakubwover 13 years ago

The general lesson that I take away from this bug is humility: It is hard to write even the smallest piece of code correctly, and our whole world runs on big, complex pieces of code.It surely does if everyone has to write their own binary search implementation when most standard libraries have one and several of them have one that is flexible enough (either due to the language's features or the binary search routine itself) to not just search through a container but through a solution space in an optimization problem. So I'm surprised the author hasn't concluded with the lesson that would be somewhat more practical: It is hard to write even the smallest piece of code correctly, so don't do it and use whatever's in your stack unless you have a good reason not to.

VMGover 13 years ago

Why are overflows silent?Shouldn't use cases where you want the overflow behavior be more explicit?

tonygover 13 years ago

So C, C++, and Java don't have good support for arithmetic. What else is new?

评论 #3531797 未加载

finnwover 13 years ago

Almost the same bug: <a href="http://code.google.com/p/guava-libraries/issues/detail?id=616" rel="nofollow">http://code.google.com/p/guava-libraries/issues/detail?id=61...</a>

willvarfarover 13 years ago

an old story but still well worth the re-reading