Wow, the language here is even more optimisitc than the rosiest descriptions you see from young researchers, which prompted me to check if the author has had much experience deductively verifying interesting "deep" functional properties of non-trivial programs. The answer seems to be no.<p>Like a newcomer to the field, he focuses on "first-day" problems such as language convenience, but the answer to his question of why this hasn't been done before is because that's not the hard problem, something he'll know once he obtains more experience.<p>One of the biggest issues — indeed, the problem separation logic tries to address, but does so successfully only for relatively simple properties — is that "correctness" does not feasibly (affordably) compose. The difficulty of proving the correctness of a program made out of components, each of which has already been proven correct for <i>any</i> desired property is not easier than proving the correctness of the program from scratch, without concern for its decomposition. I.e. proving the correctness of a program made of ten provably-correct 500-line components is no easier than proving the correctness of all 5000 lines at once. This has been shown to be the case not only in the theoretical worst case, but also in practice.<p>Here's an example to make things concrete. Suppose we have the following (Java) method, calling some unknown, pure, `bar` method:<p><pre><code> long foo(long x) {
if (x <= 2 || (x & 1) != 0)
return 0;
for (long i = x; i > 0; i--)
if (bar(i) && bar(x - i))
return i;
throw new Error();
}
</code></pre>
It is very easy to describe exactly under what conditions an error would be thrown. Similarly, it is easy to describe the operation of the following method, and prove that it performs its function correctly:<p><pre><code> boolean bar(long x) {
for (long i = x - 1; i >= 2; i--)
for (long s = x; s >= 0; s -= i)
if (s == 0)
return false;
return true;
}
</code></pre>
However, it is not easy to determine which inputs, if any, would cause foo to throw an error using that particular bar. In fact, we only happen to know that this particular question is extremely hard because it is one that has interested mathematicians for 300 years and remains unanswered.<p>While most verification tasks don't neatly correspond to well-known mathematical problems, and most require far less than 300 years to figure out, this kind of difficulty is encountered by anyone who tries to deductively verify non-trivial programs for anything but very specific properties (such as "there's no memory corruption", which separation logic does help with). Various kinds of non-determinism, such as the result of concurrency or any kind of interaction, only makes the possible compositions more complex.<p>In short, the effort it takes to verify a program does not scale nicely with its size, even when it is neatly decomposed, and it is this practical affordability — which is not a result of the elegance of the tools used — that makes this subject so challenging (and interesting), and requires some humility and lowering of expectations even when it is useful (and it can be certainly useful when yielded properly and at the right scope).<p>Another problem is an incorrect model of how programs are constructed. One might think that if a programmer has written a program, then they must have some informal (but deductive) model of it in their mind, and all that's missing is "just" formally specifying it. But that is not how programs are constructed over time when many people are involved. In practice, programmers often depend on <i>inductive</i> properties in their assumptions, such as "if the software survived for many years, then local changes are unlikely to have global effects that aren't caught by existing tests." Those assumptions are good enough for providing the (often sufficient) level of correctness we already reach, but insufficient for constructing software that can be formally specified, let alone deductively proven correct.<p>That is why much of the contemporary research focuses on less sound approaches, that aren't fully deductive, such as concolic testing (e.g. Klee), that allow better scaling for both specification and verification at the cost of "perfection".<p>The reason why both research and industry don't all do what is proposed here is because they know that's not where to real problems are. There are bigger issues to tackle before making the languages more beginner-friendly.