One point this article skips over is that Dijkstra's algorithm can be used more generally to find a shortest paths tree to all destinations, whereas A* is really just a single-source, single-destination pathfinding algorithm. Adding the heuristic obviously gets you to termination faster, but the work done by Dijkstra's algorithm isn't wasted if you're actually going to use it later.
Another useful way to see A* is as the synthesis of two different search algorithms. To compute which untraversed node to choose next, A* uses the function<p><pre><code> f(n) = g(n) + h(n)
</code></pre>
where g(n) is the actual traversed distance from the start to n, and h(n) is an (optimistic) approximation of the distance from n to the goal. Now, without making any other modifications to the algorithm, if we substitute<p><pre><code> g(n) = 0
</code></pre>
what we get is the greedy "best-first" search [1]. If, on the other hand, we select<p><pre><code> h(n) = 0
</code></pre>
we get a single-goal variant of Dijkstra's algorithm [2].<p>[1] <a href="http://en.wikipedia.org/wiki/Best-first_search" rel="nofollow">http://en.wikipedia.org/wiki/Best-first_search</a><p>[2] <a href="http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm" rel="nofollow">http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm</a>
When I was taught A*, I was just given the mathematical definition of admissible and left to run with that. The intuition the author gave transforms some vague notion of a weight function with certain properties into an almost obvious mental picture. I wish more things were taught this way at school.
These slides summarize pretty much all the recent developments in shortest path algorithms:<p><a href="http://research.microsoft.com/en-us/people/goldberg/erice.pdf" rel="nofollow">http://research.microsoft.com/en-us/people/goldberg/erice.pd...</a><p>There are example with maps overlaid with the nodes visited by each algorithm before finding the solution. The difference between Dijkstra, A* and landmarks-based algorithms is impressive!