<i>It is not sufficient merely to prove a program correct; you have to test it too. Moreover, to be really certain that a program is correct, you have to test it for all possible input values, but this is seldom feasible.</i><p>This statement is tantamount to saying "you don't merely need to prove Fermat's Last Theorem, you also have to test it for all possible input values". By this line of reasoning, most of mathematics should be thrown out.<p>If you've proven a program correct, <i>it is correct for all input values</i>, whether you have tested them or not. If your proof ignores complexities such as the range of data values, it isn't a correct proof to begin with. And yes, proving programs correct is often possible -- in fact, the highest levels of software verification actually require explicit proofs of correctness for much of the code; see <a href="http://en.wikipedia.org/wiki/Evaluation_Assurance_Level#EAL7:_Formally_Verified_Design_and_Tested" rel="nofollow">http://en.wikipedia.org/wiki/Evaluation_Assurance_Level#EAL7...</a>
Previous, extensive discussion: <a href="http://news.ycombinator.com/item?id=1130463" rel="nofollow">http://news.ycombinator.com/item?id=1130463</a><p>I once ran a programming challenge based on this, but I discontinued when I started to get threats of physical violence from people whose code didn't pass the tests.<p>The Dictionary of Algorithms and Data Structures on the NIST site has a correction I submitted on exactly this point:<p><a href="http://xlinux.nist.gov/dads/HTML/binarySearch.html" rel="nofollow">http://xlinux.nist.gov/dads/HTML/binarySearch.html</a><p>And my blog post: <a href="http://www.solipsys.co.uk/new/BinarySearchReconsidered.html?HN" rel="nofollow">http://www.solipsys.co.uk/new/BinarySearchReconsidered.html?...</a>
Nice to see that the Go gets it right:<p><pre><code> func Search(n int, f func(int) bool) int {
// Define f(-1) == false and f(n) == true.
// Invariant: f(i-1) == false, f(j) == true.
i, j := 0, n
for i < j {
h := i + (j-i)/2 // avoid overflow when computing h
// i ≤ h < j
if !f(h) {
i = h + 1 // preserves f(i-1) == false
} else {
j = h // preserves f(j) == true
}
}
// i == j, f(i-1) == false, and f(j) (= f(i)) == true => answer is i.
return i
}
</code></pre>
<a href="http://code.google.com/p/go/source/browse/src/pkg/sort/search.go?name=weekly.2012-01-27#57" rel="nofollow">http://code.google.com/p/go/source/browse/src/pkg/sort/searc...</a>
An interesting article that links back to this one:<p><a href="http://reprog.wordpress.com/2010/04/19/are-you-one-of-the-10-percent/" rel="nofollow">http://reprog.wordpress.com/2010/04/19/are-you-one-of-the-10...</a><p>The claim is that even ignoring overflow, only 10% of programmers can correctly implement a binary search. When I tried it, I thought I got it working, but it was later pointed out that I didn't handle empty lists correctly.<p>Programming correctly is hard.
While true, it seems that if your array is <i>that</i> close to the inherent size limit in indexing by ints, this is at most a temporary fix. How fast does your sorting requirements go from 2^31 to 2^32? It seems that you should be using size_t here, not int.
Bypass the borked formatting (original is JS dependent; link below is to Google text-only cache):<p><a href="http://webcache.googleusercontent.com/search?strip=1&q=cache:http%3A%2F%2Fgoogleresearch.blogspot.com%2F2006%2F06%2Fextra-extra-read-all-about-it-nearly.html" rel="nofollow">http://webcache.googleusercontent.com/search?strip=1&q=c...</a><p>IIRC, this made the rounds a few years ago, in case it sounds familiar.
I'm tempted to solve it with:<p><pre><code> low/2 + high/2 + (low & high & 1);
</code></pre>
That side steps the overflow condition completely.
I'm tempted to suggest that the fixed C/C++ version is still not correct with respect to this issue, namely on architectures where INT_MAX == UINT_MAX. (Or really INT_MAX <= UINT_MAX < 2 * INT_MAX, which is valid in C99 for INT_MAX >= 65535.)
I don't like the index variables and splitting in half part. I like to write my binary searches with bit patterns:<p>1) figure out the number of bits required to cover the largest index (= a little bit of cheap bit-twiddling)<p>3) initialize your index to 0<p>2) flip a bit on in index, starting from the highest bit available from step 1)<p>3) if data[index] is smaller than what you're looking for, leave the bit on<p>4) try with the next-lowest bit and loop to 2) until at bit #0<p>It's pretty easy to write it out in a way that's just obvious, instead of requiring the reader to wrap his mind about which variable is the lowest and highest bound and whether the indexes are inclusive/exclusive in which ends, etc.
This bug requires either (1) low/mid/high are pointers [edit - I don't think this is valid; IIRC pointer arithmetic is only sane for add/subtract an int from a pointer, not divide a pointer or add two pointers]; or (2) item_count > INT_MAX / 2, which means memory_size >= sizeof(your_array) == sizeof(item) * item_count > INT_MAX / 2, which means either 1-byte or 2-byte array elements, or that your ints are smaller than your pointers. So is it really "nearly all"?
<i>The general lesson that I take away from this bug is humility: It is hard to write even the smallest piece of code correctly, and our whole world runs on big, complex pieces of code.</i><p>It surely does if everyone has to write their own binary search implementation when most standard libraries have one and several of them have one that is flexible enough (either due to the language's features or the binary search routine itself) to not just search through a container but through a solution space in an optimization problem. So I'm surprised the author hasn't concluded with the lesson that would be somewhat more practical: <i>It is hard to write even the smallest piece of code correctly</i>, so don't do it and use whatever's in your stack unless you have a good reason not to.
Almost the same bug: <a href="http://code.google.com/p/guava-libraries/issues/detail?id=616" rel="nofollow">http://code.google.com/p/guava-libraries/issues/detail?id=61...</a>