Proving that Android’s, Java’s and Python’s sorting algorithm is broken

608 点作者 amund超过 10 年前

19 条评论

bsdetector超过 10 年前

Their corrected version:<pre><code> if (n > 0 && runLen[n-1] <= runLen[n] + runLen[n+1] || n-1 > 0 && runLen[n-2] <= runLen[n] + runLen[n-1]) </code></pre> In the first clause they add earlier to later runLen elements, but in the second they add later to earlier elements. Switching the order just makes the expression harder to understand, like reusing variables within a scope. Addition is commutative and there's nothing technically wrong with it, but this construction makes it appear like there may be something special about element n when there's not.The programmer also has to do mental arithmetic to check the bounds. Bounds checks can be written so that the largest index subtracted is also the value tested:<pre><code> if (n >= 1 && runLen[n-1] <= runLen[n] + runLen[n+1] || n >= 2 && runLen[n-2] <= runLen[n-1] + runLen[n]) </code></pre> This makes it easier to see that the bounds are correct. Formal methods found an important bug that would not have been found otherwise, but a lot of lesser bugs can be prevented just by writing clear and consistent code.

评论 #9101333 未加载

mkesper超过 10 年前

From the article: The reaction of the Java developer community to our report is somewhat disappointing: instead of using our fixed (and verified!) version of mergeCollapse(), they opted to increase the allocated runLen “sufficiently”. As we showed, this is not necessary. In consequence, whoever uses java.utils.Collection.sort() is forced to over allocate space. Given the astronomical number of program runs that such a central routine is used in, this leads to a considerable waste of energy.

评论 #9100549 未加载

评论 #9102798 未加载

评论 #9100416 未加载

评论 #9101879 未加载

评论 #9100323 未加载

thomasahle超过 10 年前

This is actually really cool. They tried to verify Timsort as implemented in Python and Java using formal methods. When it didn't seem to work, they discovered that there was actually a missing case in both implementations, which could lead to array out of bounds exceptions.Really shines as an example of how important proof is in computer science.

评论 #9100376 未加载

bglazer超过 10 年前

Has anyone used the KeY project in industry?I'd certainly prefer proofs over unit tests. However, I don't understand formal proof systems sufficiently well to know whether these they would work in terms of your typical "app" that makes RPC calls, DB changes, and generally has lots of moving parts and statefulness.

评论 #9102290 未加载

jordigh超过 10 年前

Oh, crap. I suppose this also means we have the bug in our GNU Octave implementation:<a href="http://hg.savannah.gnu.org/hgweb/octave/file/0486a29d780f/liboctave/util/oct-sort.cc" rel="nofollow">http://hg.savannah.gnu.org/hgweb/octave/file/0486a29d780f/li...</a>Well, time to patch it there too.

Animats超过 10 年前

Nice. As usual, entry and exit conditions aren't that hard to write; it's loop invariants that are hard.<pre><code> /*@ loop_invariant @ (\forall int i; 0<=i && i<stackSize-4; @ runLen[i] > runLen[i+1] + runLen[i+2]) @ && runLen[stackSize-4] > runLen[stackSize-3]) @*/ </code></pre> It's surprising how close their notation is to our Pascal-F verifier from 30 years ago.[1] Formal verification went away in the 1980s because of the dominance of C, where the language doesn't know how big anything is. There were also a lot of diversions into exotic logic systems (I used to refer to this as the "logic of the month club"). The Key system is back to plain old first-order predicate calculus, which is where program verification started in the 1970s.For the invariant, you have to prove three theorems: 1) that the invariant is true the first time the loop is executed, given the entry conditions, 2) that the invariant is true for each iteration after the first if it was true on the previous iteration, and 3) that the exit condition is true given that the invariant is true on the last iteration. You also have to prove loop termination, which you do by showing that a nonnegative integer gets smaller on each iteration. (That, by the way, is how the halting problem is dealt with in practice.) #2 is usually the hardest, because it requires an inductive proof. The others can usually be handled by a simple prover. There's a complete decision procedure by Oppen and Nelson for theorems which contain only integer (really rational) addition, subtraction, multiplication by constants, inequalities, subscripts, and structures. For those, you're guaranteed a proof or a counterexample. But when you have an internal quantifier (the "forall int i") above, proof gets harder. Provers are better now, though.A big practical problem with verification systems is that they usually require a lot of annotation. Somebody has to write all those entry and exit conditions, and it's usually not the original programmer. A practical system has to automate as much of that as possible. In the example shown, someone had to tell the system that a function was "pure" (no side effects, no inputs other than the function arguments). That could be detected automatically. The tools have to make the process much, much easier. Most verification is done by people into theory, not shipping products.[1] <a href="http://www.animats.com/papers/verifier/verifiermanual.pdf" rel="nofollow">http://www.animats.com/papers/verifier/verifiermanual.pdf</a>

inglor超过 10 年前

This is an excellent use case of formal verification. I can definitely see the merit in using tools like this in my code.They mention KeY <a href="http://www.key-project.org/" rel="nofollow">http://www.key-project.org/</a> . Is anyone using this here? Are there any good resources on it except for the official site (and this blog post)?

评论 #9104807 未加载

评论 #9100436 未加载

评论 #9100490 未加载

ezyang超过 10 年前

Everyone here is talking about Timsort, but you should also check out the materials they've published about the KeY project, which they used to carry out this verification. <a href="http://www.key-project.org/~key/eclipse/SED/index.html" rel="nofollow">http://www.key-project.org/~key/eclipse/SED/index.html</a> describes their "symbolic execution debugger", which lets you debug any Java code even if you don't know all the inputs (by simply taking the input as a symbolic value). The screencast is very accessible.

ujjwal_wadhawan超过 10 年前

Spark 1.1 switched default sorting algorithm from quicksort to TimSort as well in both the map and reduce phases- <a href="https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html" rel="nofollow">https://databricks.com/blog/2014/10/10/spark-petabyte-sort.h...</a>

ericfrederich超过 10 年前

Where can I get that PS1 from the shell in the video? It appears to have a green check mark if the previous command returned 0 otherwise some red symbol. Looks cool

评论 #9103761 未加载

评论 #9103964 未加载

tenfingers超过 10 年前

This is a good example of why formal verification is incredibly useful. I tried to invest some time in learning some of the proof verification languages and tools, but so far I wasn't too successful.It looks like that to formulate a proof, I always have to rewrite the algorithm/problem first in the tool's language, which is often not easy. I could see myself making mistakes in writing the proof just as well as I do when I'm programming.Proof validation is also tricky. Coq isn't fully automatic as I initially was expecting. I actually used "prover9" which is first-order only, but does automatic validation. I guess Coq is really useful when you need to understand the proof and interactive validation can guide you, whereas prover9 could help with automation.The thing is, it's still too much work, even for seemingly simple algorithms, to write a proof in either system in order to improve on the current situation of unit testing (that is: if I wanted to get something with more intrinsic value than a test case).Formally verified languages are nice, but for a gazillion of reasons you still need to verify what's running currently.

评论 #9106520 未加载

amund大约 10 年前

The authors have posted a follow-up posting: about KeY - the tool used to prove the bug in TimSort - <a href="http://envisage-project.eu/key-deductive-verification-of-software/" rel="nofollow">http://envisage-project.eu/key-deductive-verification-of-sof...</a>

imaginenore超过 10 年前

And I always thought sort functions are always tested with millions of randomly generated sequences and the results are compared with the results of other implementations known to be good.

评论 #9100408 未加载

评论 #9100379 未加载

评论 #9100292 未加载

评论 #9100317 未加载

nichochar超过 10 年前

This is bad-ass. Thanks for taking the time and investigating something that so many overlook yet use every day (myself included)

maxerickson大约 10 年前

Fixed in python:<a href="http://bugs.python.org/issue23515" rel="nofollow">http://bugs.python.org/issue23515</a>

RubyPinch超过 10 年前

anyone know the list of bugs for this?<a href="https://bugs.openjdk.java.net/browse/JDK-8072909" rel="nofollow">https://bugs.openjdk.java.net/browse/JDK-8072909</a>It seems they havn't submitted to any other trackers, which is a bit unfortunate

评论 #9100666 未加载

andrewstuart2超过 10 年前

I'm surprised that modern implementations don't use something like quicksort, since it has linear space requirements (sorts in-place) and good constants on average for its n*log2(n) running time.

评论 #9100409 未加载

评论 #9100945 未加载

评论 #9101426 未加载

评论 #9100778 未加载

评论 #9100769 未加载

评论 #9101964 未加载

评论 #9100333 未加载

wheaties超过 10 年前

Wish they'd put that up on Github or Bitbucket. There are so many things to be learned from that. I don't want to just download things or just use a tool (although it's pretty awesome that they made the tool available.)

评论 #9100844 未加载

noahl超过 10 年前

Just a nitpick, but the article makes a mathematical mistake. In the last paragraph of section 1.2, it says<pre><code> For performance reasons, it is crucial to allocate as little memory as possible for runLen, but still enough to store all the runs. *If the invariant is satisfied by all runs, the length of each run grows exponentially (even faster than fibonacci: the length of the current run must be strictly bigger than the sum of the next two runs lengths).* </code></pre> However, fibonacci growth is strictly faster than exponential. In fact, this is why n * log(n) is the lower bound on the number of comparisons a comparison-based sorting algorithm must use: because n! is approximately n * log(n).

评论 #9100895 未加载

评论 #9100588 未加载

评论 #9100607 未加载

评论 #9100867 未加载

评论 #9100643 未加载