Representing Graphs in PostgreSQL

193 点作者 gsky3 个月前

16 条评论

It is always good to know, at what point does "Postgres as X" break down. For instance, I know from experience that Postgres as timeseries DB (without add-ons) starts to break down in low billions of rows. It would be great to know that for graph DBs as well. I think a lot of people would prefer just to use Postgres if they can get away with it.

评论 #43079226 未加载

评论 #43079418 未加载

评论 #43079660 未加载

评论 #43079434 未加载

评论 #43079210 未加载

评论 #43090682 未加载

评论 #43079396 未加载

评论 #43079309 未加载

eatonphil3 个月前

I thought this was going to be about querying a Postgres table with a graph query language (SQL/PGQ).Indeed this is coming to Postgres eventually.<a href="https://www.postgresql.org/message-id/flat/a855795d-e697-4fa5-8698-d20122126567%40eisentraut.org" rel="nofollow">https://www.postgresql.org/message-id/flat/a855795d-e697-4fa...</a><a href="https://ashutoshpg.blogspot.com/2024/04/dbaag-with-sqlpgq.html" rel="nofollow">https://ashutoshpg.blogspot.com/2024/04/dbaag-with-sqlpgq.ht...</a>

评论 #43078952 未加载

评论 #43079050 未加载

Epa0953 个月前

There is also <a href="https://age.apache.org/" rel="nofollow">https://age.apache.org/</a>, an extension to facilitate graph queries in postgresql.

评论 #43078480 未加载

评论 #43078412 未加载

yonisto3 个月前

This time it must be me... Bilbo is not Frodo's father (Frodo is actually Bilbo's first and second cousin, once removed)

评论 #43079298 未加载

评论 #43078780 未加载

pella3 个月前

If you need some "Routing Graph" - check the pgRouting library <a href="https://pgrouting.org/" rel="nofollow">https://pgrouting.org/</a><pre><code> "pgRouting library contains following features: All Pairs Shortest Path, Johnson’s Algorithm All Pairs Shortest Path, Floyd-Warshall Algorithm Shortest Path A* Bi-directional Dijkstra Shortest Path Bi-directional A\* Shortest Path Shortest Path Dijkstra Driving Distance K-Shortest Path, Multiple Alternative Paths K-Dijkstra, One to Many Shortest Path Traveling Sales Person Turn Restriction Shortest Path (TRSP)"\* </code></pre> see more: <a href="https://docs.pgrouting.org/latest/en/pgRouting-concepts.html#graphs" rel="nofollow">https://docs.pgrouting.org/latest/en/pgRouting-concepts.html...</a>

评论 #43079302 未加载

t435623 个月前

ltree is the one I ended up using. I don't like it much. The way it works allows you to setup a tree but you can't easily move things around within a tree once they're in.I forgot to say that it's also obviously only a tree, not a graph. It's still useful for some cases.It has a very good way of querying which is probably fast if you have very deep trees but in the end we didn't need the deep trees so it was chosen for the wrong reasons.<a href="https://www.postgresql.org/docs/current/ltree.html" rel="nofollow">https://www.postgresql.org/docs/current/ltree.html</a>You essentially have a path field in your record which lists the ids of all the records that are parents of the current one plus the current one.e.g. if your records are regions of the world and the id is the country name then the path might look like this:Earth.Europe.PolandThere is support for queries like "what are all the descendants of Europe"<pre><code> SELECT name FROM regions WHERE path <@ 'Earth.Europe'; </code></pre> Or you can use matching:<pre><code> SELECT name FROM regions WHERE path ~ '*.Europe.*';</code></pre>

simpaticoder3 个月前

"CTE" refers to Postgres' "common table expression" syntax, the "WITH" keyword. See <a href="https://www.postgresql.org/docs/current/queries-with.html" rel="nofollow">https://www.postgresql.org/docs/current/queries-with.html</a>

评论 #43079408 未加载

voodooEntity3 个月前

Im a big fan of GraphDatabase's since about 10 years. I even wrote my own "in memory graph storage" in golang for a specific use case that none of the big GraphDatabase's could cover at the time.That said - i WISH people would embrase the existing GraphDatabases more and make the hosters support them as standard, rather than abusing existing relational databases for graph purposes.And to make it clear,i'm not talking about my own experimental one, i mean stuff like Neo4j, OrientDB etc.

评论 #43078972 未加载

评论 #43078755 未加载

评论 #43079397 未加载

评论 #43078799 未加载

gjtorikian3 个月前

It’s a neat trick! A simpler choice would be to use Postgres’ own ltree data type: <a href="https://www.postgresql.org/docs/15/ltree.html" rel="nofollow">https://www.postgresql.org/docs/15/ltree.html</a>I wrote about how we use it here: <a href="https://www.yetto.app/blog/post/how-labels-work/" rel="nofollow">https://www.yetto.app/blog/post/how-labels-work/</a>

评论 #43079655 未加载

robertclaus3 个月前

I've found the main advantage to this over a more specialized graph database is integration and maintenance alongside the rest of your application. Real world data is rarely just (or even mostly) graph data.

wslh3 个月前

If you are interested in the subject, also take a look at NetworkDisk[1] which enable users/devs of NetworkX[2] to work with graphs via SQLite.[1] <a href="https://networkdisk.inria.fr/" rel="nofollow">https://networkdisk.inria.fr/</a>[2] <a href="https://networkx.org/" rel="nofollow">https://networkx.org/</a>

ForHackernews3 个月前

Surprised to see this article with no mention of LTree: <a href="https://www.postgresql.org/docs/current/ltree.html" rel="nofollow">https://www.postgresql.org/docs/current/ltree.html</a>First-party data structure for tree (directed, acyclic graph) data

dangoodmanUT3 个月前

This is where you start to see major limitations of OLTP, and need to look at specialized graph DBs (or indexes really) that take advantage of index-free adjacency

hknlof13 个月前

At DuckCon #6 there was a talk given about implementing SQL/PGQ as an extension to DuckDB<a href="https://m.youtube.com/watch?v=QDdTbhSR2Vo" rel="nofollow">https://m.youtube.com/watch?v=QDdTbhSR2Vo</a><a href="https://duckdb.org/community_extensions/extensions/duckpgq.html" rel="nofollow">https://duckdb.org/community_extensions/extensions/duckpgq.h...</a>

评论 #43079452 未加载

jakozaur3 个月前

There was a very good discussion about PostgreSQL as graph database on Hacker News about a year ago: <a href="https://news.ycombinator.com/item?id=35386948">https://news.ycombinator.com/item?id=35386948</a>

zamalek3 个月前

This doesn't seem to cover infinite recursion, which is one of the ugliest parts of doing this type of query with CTEs. You effectively need to concatenate all of the primary keys you've seen into a string, and check it each recursive iteration.The approach I came up with was to instead use MSSQL hierarchy IDs (there's a paper describing them, so they can be ported to any database). This specifically optimizes for ancestor-of/descendant-of queries. The gist of it is:1. Find all cycles/strong components in the graph. Store those in a separate table, identifying each cycle with an ID. Remove all the nodes in the cycle, and insert a single node with their cycle ID instead (turning the graph into a DAG). The reason we do this is because we will always visit every node in a cycle when doing an ancestor-of/descendant-of query.<pre><code> +---+ +-------> B +------+ +-+-+ +-^-+ +-v-+ | A | | | C | +---+ | +-+-+ +-+-+ | | D <------+ +---+ </code></pre> Becomes:<pre><code> +---+ +---+ | A +--> 1 | +---+ +---+ +---+---+ | 1 | B | | 1 | C | | 1 | D | +---+---+ </code></pre> 2. You effectively want to decompose the DAG into a forest of trees. Establish a worklist and place all the nodes into it. The order in this worklist may be something that can affect performance (you want the deepest/largest tree first). Grab nodes out of this worklist and perform a depth-first search, removing nodes from the worklist as you traverse them. These nodes can then be inserted into the hierarchy ID table. You should insert all child nodes of each node, even if it has already been inserted before, but only continue traversing downwards if the node hasn't yet been inserted.<pre><code> +---+ +----> B +----+ +-+-+ +---+ +-v-+ +---+ | A | | D +--> E | +-+-+ +-^-+ +---+ | +---+ | +----> C +----+ +---+ </code></pre> Becomes:<pre><code> +---+ +---+ +---+ +---> B +--> D +--> E | +-+-+ +---+ +---+ +---+ | A | +-+-+ +---+ +---+ +---> C +--> D | +---+ +---+ </code></pre> Or, as hierarchy IDs<pre><code> A /1 B /1/1 D /1/1/1 E /1/1/1/1 C /1/2 D /1/2/1 </code></pre> Now, you do still need a recursive CTE - but it's guaranteed to terminate. The "interior" of the CTE would be a range operation over the hierarchy IDs. Basically other.hierarchy >= origin.hierarchy && other.hierarchy < next-sibling(origin.hierarchy) (for descendant-of queries). You need to recurse to handle situations involves nodes like D in the example above (if we started at C, we'd get D in the first iteration, but would need to recurse to the first D node in order to find E).This was two orders of magnitude faster than a recursive CTE using strings to prevent infinite recursion.The major disadvantage of this representation is that you can't modify it. It has to be calculated from scratch each time a node in the graph is changed. The dataset I was dealing with was 100,000s of nodes for each customer, took under a second, so that wasn't a problem. You could probably also identify changes that don't require a full rebuild (probably any change that doesn't involve a back edge or cross edge), but I never had to bother so didn't solve it.