I think this is another instance of the transition of efficient serial computations opposed to runtime efficient parallel computations, at least the changes look very similar to the adaptions we did to algorithms in our parallel computation lectures.<p>Just compare this with pushing a computation through an operation-annotated DAG: A good serial program computes a topological order and annotates each node with the result so far. This is efficient, because each node is considered as little as possible (once for the computation, once for each successor), but requires pre-computations and allows only a single thread to compute things.<p>The program with the least possible parallel runtime just assigns a computation unit to each and every node and during each parallel step, each unit computes the value if it is possible. This requires syncronization and does unnecessary steps, but overall, this scales pretty good with more units.<p>I think that is a neat coincidence :)