This doesn't seem to cover infinite recursion, which is one of the ugliest parts of doing this type of query with CTEs. You effectively need to concatenate all of the primary keys you've seen into a string, and check it each recursive iteration.<p>The approach I came up with was to instead use MSSQL hierarchy IDs (there's a paper describing them, so they can be ported to any database). This specifically optimizes for ancestor-of/descendant-of queries. The gist of it is:<p>1. Find all cycles/strong components in the graph. Store those in a separate table, identifying each cycle with an ID. Remove all the nodes in the cycle, and insert a single node with their cycle ID instead (turning the graph into a DAG). The reason we do this is because we will always visit every node in a cycle when doing an ancestor-of/descendant-of query.<p><pre><code> +---+
+-------> B +------+
+-+-+ +-^-+ +-v-+
| A | | | C |
+---+ | +-+-+
+-+-+ |
| D <------+
+---+
</code></pre>
Becomes:<p><pre><code> +---+ +---+
| A +--> 1 |
+---+ +---+
+---+---+
| 1 | B |
| 1 | C |
| 1 | D |
+---+---+
</code></pre>
2. You effectively want to decompose the DAG into a forest of trees. Establish a worklist and place all the nodes into it. The order in this worklist may be something that can affect performance (you want the deepest/largest tree first). Grab nodes out of this worklist and perform a depth-first search, removing nodes from the worklist as you traverse them. These nodes can then be inserted into the hierarchy ID table. You should insert all child nodes of each node, even if it has already been inserted before, but only continue traversing downwards if the node hasn't yet been inserted.<p><pre><code> +---+
+----> B +----+
+-+-+ +---+ +-v-+ +---+
| A | | D +--> E |
+-+-+ +-^-+ +---+
| +---+ |
+----> C +----+
+---+
</code></pre>
Becomes:<p><pre><code> +---+ +---+ +---+
+---> B +--> D +--> E |
+-+-+ +---+ +---+ +---+
| A |
+-+-+ +---+ +---+
+---> C +--> D |
+---+ +---+
</code></pre>
Or, as hierarchy IDs<p><pre><code> A /1
B /1/1
D /1/1/1
E /1/1/1/1
C /1/2
D /1/2/1
</code></pre>
Now, you do still need a recursive CTE - but it's guaranteed to terminate. The "interior" of the CTE would be a range operation over the hierarchy IDs. Basically other.hierarchy >= origin.hierarchy && other.hierarchy < next-sibling(origin.hierarchy) (for descendant-of queries). You need to recurse to handle situations involves nodes like D in the example above (if we started at C, we'd get D in the first iteration, but would need to recurse to the first D node in order to find E).<p>This was two orders of magnitude faster than a recursive CTE using strings to prevent infinite recursion.<p>The major disadvantage of this representation is that you can't modify it. It has to be calculated from scratch each time a node in the graph is changed. The dataset I was dealing with was 100,000s of nodes for each customer, took under a second, so that wasn't a problem. You could probably also identify changes that don't require a full rebuild (probably any change that doesn't involve a back edge or cross edge), but I never had to bother so didn't solve it.