The article proposes a computable approximation of Kolmogorov complexity which minimises the length of a program multiplied by its running time.<p>There's actually a much nicer alternative, sometimes called Levin Complexity, which minimises the length of a program (in bits) plus the logarithm of its running time (base 2). What's really cool about Levin Complexity is that the time required to calculate it only differs from the running time of the minimal program by a constant factor.<p>For example, let's say we have some data X and we want to find its Levin Complexity, using some pre-defined language L (e.g. the tape of a universal turing machine, or binary lambda calculus). For simplicity, we'll assume L is 'prefix-free' (e.g it has an EOF (end of file) symbol):<p><pre><code> def levinComplexity(x):
# Start by trying complexity 1
currentComplexity = 1
# Keep trying larger complexities until we return
while True:
# Try different lengths, to trade-off length and time
for 1 <= length <= currentComplexity:
for program in enumeratePrograms(length):
result = run(program, steps = 2^(currentComplexity - length))
if result == x:
# Found a program which outputs x
return (currentComplexity, program)
# x wasn't found, try a larger complexity
currentComplexity += 1
</code></pre>
This will start by enumerating all data of complexity 1 (the output of programs of length 1 bit, run for 2^0 = 1 steps), checking if any is equal to the input x. If not, the 'while' loop iterates and it checks all data of complexity 2 (outputs of programs of length 1 bit, run for 2^1 = 2 steps; and programs of length 2 bits, run for 2^0 = 1 steps); the next iteration checks complexity 3 (length 1 bit, run for 2^2 = 4 steps; length 2 bits, run for 2^1 = 2 steps; and length 3 bits, run for 2^0 = 1 steps); and so on until we find x and return.<p>The iteration which checks complexity N needs to re-run all the same programs as for N-1, but for twice as long; hence doubling the amount of time needed. We <i>also</i> need to run all programs of length N for one step; since there are 2^N programs of length N, this <i>also</i> doubles the time taken. This makes each iteration of the 'while' loop take 4 times as long as the previous, or about O(2^N) time. Hence the overall algorithm is around O(2^C), where C is the input's Levin Complexity (this grows <i>so quickly</i> that the final iteration dominates the running time).<p>If we focus on <i>any</i> particular program P we find something interesting: our search will eventually start executing P, once currentComplexity = length(P) it will be run for 1 step. The next iteration will run P again, but for 2 steps; the next iteration will run for P for 4 steps; then 8 steps and so on, doubling each time. Hence iteration N will run program P for around O(2^N) steps. Since iteration N takes around O(2^N) time, we're executing O(2^N) steps of P per O(2^N) steps of levinComplexity. In other words, this algorithm is running <i>every</i> program at O(1) speed!<p>What's the catch? There are two things going on:<p>- Restarting at exponentially-spaced intervals lets us switch between programs without having to save/restore any state, but surprisingly it only slows us down by at most two-thirds. It's easiest to work this out backwards: the first time we execute a program's Tth step, we must have taken T steps <i>plus</i> all of the previous executions which got restarted. Any previous execution must have restarted <i>before</i> the Tth step; any execution before that must have taken fewer than T/2 steps; any before that fewer than T/4 steps; and so on. Hence we will reach the Tth step after at most T + T + T/2 + T/4 + T/8 + ... = T(2 + 1/2 + 1/4 + 1/8 + ...) = T(2 + 1) = 3T steps.<p>- Each program is getting a constant fraction of the execution time, but longer programs get exponentially smaller fractions. Programs of length N+1 get half as much execution time as those of length N. The overall execution time is divided up between program lengths like 1/2 + 1/4 + 1/8 + ... Since we're using a fixed language there are a constant number of programs of each length, so each gets a constant fraction. (Note: The prefix-free property is useful for making this work)<p>These tiny fractions of time are the main reason this is an impractical algorithm. Although they're exponentially small, big-O only cares about functions of the input size; whilst these exponentials depend only on the program lengths.<p>Note that our search finishes when we find the minimal program for x. Since that program (whatever it is) is being run in linear time, our search will finish once that program has output x.