Ask HN: Why are relational DBs are the standard instead of graph-based DBs?

233 pointsby kirillrogovoyover 3 years ago

Hi,I've been recently exposed to some informational systems where all the domain data was modeled as one graph instead of a set of inter-related tables.I worked with RDBs (primarily Postgres) for 5+ years and I cannot say that it ever felt wrong, but the more I think about modeling data as graphs, the more it makes me confused why it's not the default way.Graphs seemed to be: (1) Easier to scale (both storage-wise and complexity-wise). (2) Closer to how we model the world in our minds, hence easier to reason about. (3) Easier to query (felt more like GraphQL than SQL if it makes any sense).The way I see it, there are two major ways to connect singular entities in a data model: 1. Lists (aka tables) that allow you to sort, filter, and aggregate within a set of entities of the same kind. 2. Relations (aka graph edges or foreign keys) to connect singular entities of different kinds.... And I can imagine relational DBs being List-first Relation-second, and graph DBs being the opposite. But maybe that's too much of a simplification.Anyway, looking back at different domains I worked with, it felt like I had spent much more time working with relations than with lists.Another signal: I have an intern developer, and it took him 1 minute to understand the basics of how graphs work, but then I spent two hours explaining why we needed extra tables for many-to-many relations and how they worked.Any thoughts? What am I missing? Are RDBs the default way mostly due to historical reasons?Discussion on this topic that I could find: https://news.ycombinator.com/item?id=27541453

36 comments

geophileover 3 years ago

This exact debate took place in the early 70s. There were three major database models: relational, network, and hierarchical. Network and hierarchical had quite a bit of success, technically and as businesses. IMS was (is?) an IBM product based on the hierarchical model.Network databases, which seem quite similar to graph databases, were standardized (<a href="https://en.wikipedia.org/wiki/CODASYL" rel="nofollow">https://en.wikipedia.org/wiki/CODASYL</a>).Both the hierarchical and network models had low-level query languages, in which you were navigating through the hierarchical or network structures.Then the relational model was proposed in 1970, in Codd's famous paper. The genius of it was in proposing a mathematical model that was conceptually simple, conceivably practical, and it supported a high-level querying approach. (Actually two of them, relational algebra and relational calculus.) He left the little matter of implementation as an exercise to the reader, and so began many years of research into data structures and algorithms, query processing, query optimization, and transaction processing, to make the whole thing practical. And when these systems started showing practical promise (early 80s?), the network model withered away quickly.Ignoring the fact that relational databases and SQL are permanently entrenched, an alternative database technology cannot succeed unless it also supports a high-level query language. The advantages of such a language are just overwhelming.But another factor is that all of the hard database research and implementation problems have been solved in the context of relational database systems. You want to spring your new database technology on the world, because of its unique secret sauce? It isn't going anywhere until it has a high-level query language (including SQL support), query optimization, internationalization, ACID transactions, blob types, backup and recovery, replication, integration with all the major programming languages, scales with memory and CPUs, ...(Source: cofounder of two startups creating databases with secret sauces.)

评论 #28738350 未加载

评论 #28738838 未加载

评论 #28738442 未加载

评论 #28738544 未加载

评论 #28740525 未加载

评论 #28740855 未加载

评论 #28739224 未加载

twoodfinover 3 years ago

The databases that dominated the industrial scene prior to the emergence of relational designs—IBM’s IML, GE’s IDS, the CODASYL systems—were described as following a “network” model, which didn’t have much conceptual distance from what today we’d call a “graph” model.They were largely replaced with relational systems for more or less exactly the reason Codd laid out in his classic paper[1]: If efficient processing of logical operations depends on a predetermined physical structure of data, the range of practical applications for your database is severely constrained.That suggests that the proper niche for graph databases contains those applications with a predefined set of highly recursive operations, which is more or less where we find them today.[1] <a href="http://db.dobo.sk/wp-content/uploads/2015/11/Codd_1970_A_relational_model.pdf" rel="nofollow">http://db.dobo.sk/wp-content/uploads/2015/11/Codd_1970_A_rel...</a>

评论 #28737542 未加载

评论 #28738475 未加载

评论 #28737852 未加载

kleinschover 3 years ago

As others have said, graph databases push foreign keys, joins, referential integrity, and migrations to the application layer. Right now the combinations of databases and software just can’t handle that with nearly the sophistication of Postgres + an open source ORM.Facebook famously uses a graph database called TAO [1] with an ORM called Ent sitting on top of it. Everything lives on that - internal and external applications. Every dev at FB writing backend code uses it. Can’t talk too much and how these work since they’re internal and proprietary, but looks like we open sourced a Go ORM that’s Ent-inspired that shows the basics [2]. It’s definitely a weird mental model to get used to, but the combination of deep business logic in schema definitions, complex joins, and insane query performance make it super powerful. We’re definitely one of the bigger non-SQL shops, with thousands of developers using this every day as their primary data store.[1] <a href="https://engineering.fb.com/2013/06/25/core-data/tao-the-power-of-the-graph/" rel="nofollow">https://engineering.fb.com/2013/06/25/core-data/tao-the-powe...</a>[2] <a href="https://entgo.io/docs/getting-started/" rel="nofollow">https://entgo.io/docs/getting-started/</a>

评论 #28737232 未加载

评论 #28737667 未加载

matt_sover 3 years ago

I'd contest your points 2 and 3 from a business application perspective.Relational DB's resemble ledgers from a business perspective. Most business apps and nearly all financial apps think in terms of ledgers of transactions. Go one to two degrees of separation from a financial transaction in any web app these days and the users of that system will want to line up other activity/transactions in the app with the financial ledger. Then they want to ask questions and have reports on things like: how many users for that customer? how many purchases this month for that customer? what did they purchase? etc.I think its historical and historical also implies the vast majority of systems aren't likely to change if there is some-other-tech that may be easier to work with. Its a safer decision to pick well known tech because you'll know the trade-offs, support issues, hiring parameters, etc. from the history of all that came before you. To reverse that point - picking some esoteric language/db solution increases the risk of a product/project failing, not because of the technology but all the other factors that surround that technology.

评论 #28737623 未加载

lysecretover 3 years ago

1. "Easier to scale" Scaling is such a weired topic, its like that joke about teenage sex, everybody always talks and is concerned about it but how many people really have to scale? And generally if you have to scale you have money and money can make any system work2. "Closer to how we model the world in our minds, hence easier to reason about" I don't think i agree. I would argue, most peoples mental model of data is strongly inspired by tables. Thus, the big success of excel.3."Easier to query" ok maybe I am just too used to SQL databases but i would very strongly disagree there. SQL or using any ORM its super easy to query. Of course some joins might get complicated. But if you always have to run a lot of complicated joins you can probably work on the data model, or use caching.

评论 #28738533 未加载

评论 #28738711 未加载

jandrewrogersover 3 years ago

It is complicated but there are good reasons for it.The historical advantage relational databases have over graph databases is that the former forces a more restrictive representational structure onto the data model, which doesn't sound like an advantage at first. Restricting the ways in which a data model can be organized and traversed makes it straightforward to implement highly effective performance optimizations inside the database engine. Optimization is a tradeoff, by making one type of operation on a data model faster you often make another type of operation slower -- there is no free lunch. A database where all possible relationships in the data must be optimized is a database where little optimization is possible. The restrictions tacitly placed on relational database representations allow SELECT operations to be heavily optimized in a way that isn't possible if you want graph-like data model traversals to be fast and efficient.From the perspective of database internals, relational databases are optimized around SELECT performance and graph databases are optimized around JOIN performance. The former is intrinsically much easier to optimize. It turns out that shoe-horning data models into a relational database almost always has qualitatively better performance than using a more flexible graph database, and performance traditionally matters a great deal in databases.At a technical level, scaling ad hoc join operations -- the core operation of a graph database -- is famously extremely difficult. Ironically, most graph databases use data structures and algorithms that are tacitly optimized for the assumptions of a relational database implementation, that aren't trying to be good at graph-like things. You generally don't see graph databases that were designed from first principles to be a graph database; their internals are typically that of a relational database that supports join recursion.We figured out very early on how to optimize the hell out of relational databases. To this day we have been unable to build graph databases that are similarly optimized, partly because the computer science is much more difficult.

chadcmulliganover 3 years ago

Graph DB's have the relationships and the data in one lump. With Table based databases you can have relationships between tables stored in tables and obtain your graph structure that way. You can also change your graph structure by changing the data in the relationships table. This was a big problem in ISAM databases, if your schema changed then you had to spend a lot of time rebuilding the whole graph (as I was told anyway - I came in at the beginning of SQL databases), I imagine this is a problem with graph databases to?The advantage of relational was also you could store reference data in one table - not all through the graph, another problem with ISAM. Also often when reporting you don't really know the relationships you want to report on, if this is in a hard wired graph then extracting that data and putting it in the form you want is hard, this is the power of joins - joins allow you to make the relationship you want at query time.Graphs assume you know the relationships and they're fixed in time, this is seldom the case when building a relational DB, thats why joins come in handy - you can restructure your tables without deleting the data, or not much deleting. You can add views in to, to present the data in the old way and so on.There's lots more, but that is the big ticket items.

评论 #28739662 未加载

angelzenover 3 years ago

'Nodes' are represented by the ID column [1] (aka single-column PK) in the respective table. 'Hyperedges' are tables that tie together 2 (or more) IDs, for example the classic ParentChild table. Table attributes can be seen as mini ID-Attr 'edges'. Adding a FK column to a 'node' table is a shortcut for relations that are binary and 1:N. In practice, the relational model is encoding graphs quite directly, perhaps with a funny terminology.At the query language level, a graph query language has to somehow represent the notion of a list + sort/filter/aggregate too. Haven't seen convincing improvements over the relational model, and arguably there are none: the concept of list + sort/filter/aggregate is fundamental.The use-case where graph dbs might have an edge is when querying recursive relations, but recursive query constructs are conceivable in a table-first approach as well. Possibly also creating / querying relations (tables) on-the-fly without the 'CREATE TABLE ... / JOIN ON ...' ceremony is handyI wouldn't look at SQL in particular for a modern instantiation of a relational algebra query language. Modern relational algebra libraries like dplyr in R ecosysyem are better.I have not found yet uses for graph dbs. But I am curious to see why people find graph dbs "easier", preferably with concrete examples.[1] Arguably 'primary key' terminology is confusing, as it may denote either an ID column (node) or a multi-dimensional FK tuple (hyperedge), so I prefer to avoid using it.

account-5over 3 years ago

This might be my limited knowledge but I've always thought of relational systems as a kind of graph. The foreign keys denote edges connecting the table/nodes. At least that's how I think of them, this might be complete non-sense though.

评论 #28741913 未加载

评论 #28741718 未加载

评论 #28738064 未加载

评论 #28738128 未加载

endymi0nover 3 years ago

If you want to hear my take on it after intensively working with dozens of different databases in real-world projects over the past ten years:It is because the additional flexibility of graph databases creates the responsibility to manage the explosion of edge cases, while constraints add safety, documentation of intent and reliability.And while special functions for graph traversal can seem elegant and nifty, I have not come across more than maybe a handful of cases ever that could not have been solved by a recursive CTE or equivalent, slighter complex query in relational databases.

fiparover 3 years ago

I think they're the default because the relational model is either the most, or perhaps even the only logical data model with a solid theoretical foundation that makes it suitable to solve a wide range of database problems.While there are some genuine problems that require extending or adjusting the model (e.g., temporal databases), all the problems people normally complain about relational databases are implementation, not model problems.A common misunderstanding I find is that people think the relational model does not scale, but that makes no sense. The relational model is a logical model meant to, among other things, provide independence of the logical representation of data from its physical storage. A loose but hopefully useful analogy would be that the relational model is like arithmetic, while any given database product is like a calculator. At some point, you'll hit the limits of a calculator and will get an error as a result to an operation. That won't cause you to say "arithmetic does not scale!". Instead, you'll probably try a different calculator, or perhaps even end up implementing a new piece of software to let you handle that particular computation.It is similar with the relational model. You may hit the limits of a specific implementation (can't scale beyond X cores, can't transparently partition data across multiple nodes, etc.), but that's an implementation limitation. The relational model has nothing to say about hardware, so those are not limitations in the model.I think it's good pragmatic engineering to use the best tool for the job, and that includes using non-relational databases (nitpick: I don't think any of the mainstream databases considered relational are a good representation of the model. SQL, in particular, is very bad at the job. But we're probably stuck with calling them "relational databases") when they're more suitable for the task at hand. However, the important consideration is that the relational model itself can be used as a foundation to represent whatever data you need to represent, including graphs. It's just that there may be no suitable implementation for your needs right now.One last thing, when you say Relations are like graph edges or foreign keys, I wonder if perhaps you're misunderstanding the relational model? The 'relational' in it is not about linking from one entity to another. It's about a relation from a set of attributes and types, to a set of values for those. In SQL-parlance: you can have a relational database with a single table, no need for foreign keys (even to itself) for it to be relational.

评论 #28738257 未加载

评论 #28741993 未加载

评论 #28737481 未加载

culebron21over 3 years ago

Reason #1: Relational DBs got a simple query language, which boosted their popularity in 1980s. MS Access made it a funny game, and you could get it going in under an hour.I had experience with Wikidata, and it was way harder. The logic of the language is impossible to figure out by reading the examples. (My request was simple like "get region X, find all nested entities and their population".) I'm not familiar with other query languages to Graph DBs, but if this is state of the art, I'd avoid it.Reason #2: Graphs are not easier to reason about. When you have all properties as optional, and every connection (edge) gets a set of properties too, things become much harder."John is Frank's father" in that database would be a very complex structure: entity A which is John with properties such as "belonging to a class of" humanity (which is another entity), human own first name, last name, and finally a "child" property, linking to Frank, who's got same set of attributes.That's hard to put in your head. With relational DB for this example, you have to imagine just a 4*2 table (3 if we count the header).I think these 2 factors make development with graph DBs much harder, and developers resort to them only when absolutely necessary, and keeping only a single type of relations in graph (like road graph, or parent-child relationship).

评论 #28740609 未加载

3dfanover 3 years ago

I think the language the developer writes their queries in is most important. It is very easy and logical to express what you want in SQL. Say we want to show a list of cities with more than a million people. Easy:<pre><code> SELECT name FROM cities WHERE population>1000000 </code></pre> GraphQL queries on the other hand always look like gibberish to me.

评论 #28736952 未加载

评论 #28738443 未加载

Hexayurtover 3 years ago

Tape. SQL databases emerged when data was stored on tape.join table1, table2 where table1.id = table2.customer_idtype operations would have a tape for table1 in one drive, and a tape for table2 in the other drive. Things like fixed length records emerged to make it possible to fast forward the tape a specific number of inches to the point where the next record would begin, facilitating non-linear access.Once that model was completely baked into the tooling, it didn't go away when the data moved to HDs then SSDs. The paradigms have outlived the hardware.It's a bit like the save icon still being a floppy disk.

评论 #28737164 未加载

评论 #28737123 未加载

评论 #28737097 未加载

评论 #28741442 未加载

评论 #28737898 未加载

cratermoonover 3 years ago

There is huge value in graph theory that RDBs are almost completely incapable of putting to work. I think it's partly a historical accident that RDBs arrived when they did and I don't think graph theory was well enough understood by the computer scientists of the day. Relational theory and SQL were a better fit for the languages, operating systems, and big money problems of the day, which were mostly about business operations – accounting, inventory management, manufacturing – and to a lesser degree science and mathematics.Today we have very different languages (while COBOL and FORTRAN still exist they haven't influence modern languages much), operating systems built on different ideas, and, perhaps most importantly, networks.I've been waiting most of my career for something to take over from RDBs and SQL, something that supports the ideas of graph theory as well as strong typing, composability, and so forth. And no, GraphQL is most definitely not it.

tlarkworthyover 3 years ago

Think about the the practical value of queries. Queries are the main point of DBs. Now think about most applications. Are graph queries useful for CRUD? .... no. So that is why.relational is good for lookups and adjacency queries, and thats the main query semantic in CRUD

评论 #28736886 未加载

brudgersover 3 years ago

The relational model includes the relational algebra.The relational algebra means that the order of operations do not change the results.That's a huge advantage for query optimization. Relational database systems maintain metadata that allows pruning poor orders of operations.The other advantage of the algebra's lack of implicit ordering is that arbitrary orderings can be added on top of it, e.g. multi-version concurrency control without changing the underlying algebraic logic.

aptxkidover 3 years ago

This is a great topic. I have been working on TAO for many years. I am not very familiar with other graph databases; I assume fundamentally they are more or less the same. Here are some differences between RDB and a graph db IMHO, 1. sharding and transaction boundary 2. Secondary index support 3. How “join” works (eg. give me a list of my friends who follows Justin Bieber)You’re right that graph db is very easy to use a lot of the times.

评论 #28737205 未加载

creshalover 3 years ago

> Are RDBs the default way mostly due to historical reasons?It sure helps, because you got literally generations of developers trained on how to best utilise them.Additionally, you can use ORMs, GraphQL adaptors or similar abstractions to build graphs on top of relational storage, and keep your existing infrastructure, which makes hybrid setups a lot more attractive than graph-first ones for non-startup environments.

Glyptodonover 3 years ago

Relational databases are very significantly graph-like in structure, after all, they're filled with foreign keys when decomposed, and foreign keys build graphs, hence why ER diagrams aren't tables. The main issue is that the language used isn't intended for selecting pieces of your data layout in their original structure, it's intended to use that original structure to return table-like projection results that can be expressed through math a la abstracted matrix operations essentially (relational calculus, etc.).The reason "graphs" seems easier is because people tend to ignore the data models for intuiting relationships, like "An Arm has a Hand," which SQL can handle pretty well, but can't return in a nested way. In some respects, this suggests SQL is missing a new syntax layer more than anything. That said, SQL, the back-end model, does fall short when modeling non-directed graphs IMO, though I'm not sure it's easy to express non-directed graph queries in GraphQL either. (And I don't have the experience with Graph DBs to know too much about what they offer.)That said, I don't think it should take two hours to explain many to many tables. In graph terms, they're an expression of the requirement that you need a record to indicate every pairwise combination (edge) that exists. So to me that implies the mental version of "Graph" M2M was probably incomplete, as M2M is a very graph-centric concept.

oneplaneover 3 years ago

Generally: because complexity never goes away, it just finds a different spot to be solved in. The RDBMS is mostly 'just there' from the perspective of the average programmer, and solving a problem related to databases is more of an ORM library issue than a programmer's issue. Let's not forget that most software programming tends to be various CRUD incarnations and UI fluff, not complex problem solving and scientific engineering. Due to the massive scale and volume of this type of work, the available databases and mindshare is just not there for most other types to succeed.A difference can be found in mostly-frontend data storage and retrieval applications were storing information is actually abstracted into a SaaS RESTful API or GraphQL API and you never get to talk to the underlying database. Another one is where a library or framework requires something like ElasticSearch or MongoDB to work. This isn't because they are inherently 'better', but because that's just what the README.md in the repo happened to say when a developer came along to make use of the framework or library that fulfilled some generic functionality.If you think about it, most semi-complex implementation details in general software you encounter turn out to be low-quality re-implementations of state machines, graphs, hashmaps and printf. A lot of the world runs on this type of stuff and makes a lot of money from it. (doesn't mean it's high quality software or that sound engineering choices were made - sometimes it's just standard components, availability of people and technology and a time-money tradeoff)

flyingsilverfinover 3 years ago

I think that you're correct in your assessment of relational vs graph-like structures: it's closer to our data domains we model and think of, more flexible, etc. We may be seeing something similar in the ML world where things are moving from tabular-dominant data to being able to process graphs more natively. A table is just a very structured graph after all!SQL is the standard because, as others have pointed out, it's so entrenched and also builds upon a solid theoretical foundation. And given its dominance, it has been optimised and performed extremely well until recently, where data complexity is catching up again.Recent noSQL databases won't take over SQL because of the lack of schema/typing. They do scale nicely, but aren't as constrainable as SQL, which is a feature (compare building a large software in Python vs Rust or Java) that enforces safety and good abstractions. There are some newer DBs which are combining strict schemas with NoSQL, which is promising!Disclaimer: I work on TypeDB (vaticle.com/typedb) which is a native ERA (entity-relation-attribute) model with strict typing via the schema.

chiefalchemistover 3 years ago

I don't mean to get off topic, but I'm surprised the intern - presuming they're are a CS major or similar - didn't already know both. These tools are simply a reflection of data structures. Are such things no longer part of 100 or 200 level CS?

评论 #28741514 未加载

joshsynover 3 years ago

I really believe this has to do with reliability and performance. Intuitively speaking I'd agree with you that graphs are more higher-level concepts.

skohanover 3 years ago

Side question: is there a good rule of thumb for when an application can start benefitting from using a database? I'm working on a project whose internal data structures are starting to resemble a database, and I've been considering moving from just using language-native data structures to an in-memory sqlite instance, but I'm not sure what the tradeoffs are.

评论 #28741752 未加载

评论 #28741668 未加载

rep_movsdover 3 years ago

Also performance.You may be looking at highly scaled distributed databases with thousands of concurrent users.What about the base case of one large dataset being scanned linearly or by complex joins. In its basic form an RDBMS table is like an array of structs, other types of DBs are like lists.You wont see much performance difference in normal applications, but when there is lots of work to be done RDBMS will shine.

habiburover 3 years ago

Also note that hierarchical databases saw a revival in the 90s as web was taking over and web data was mostly hierarchical. A few groups started works on building hierarchical database systems.That didn't go anywhere. Review a few years later show all those buggy, crashed frequently and worst -- slower than just storing the hierarchical data into an RDBMS.

migaover 3 years ago

Relational databases proposed unique combination of an easily optimizable abstract model, and memory efficient implementation that won the day.Since the optimizable abstraction becomes more and more popular, it will dominate until we find another model that enjoy both high-level abstraction and ease of optimization.

salawatover 3 years ago

RDB's and the theory behind them were simply elucidated earlier in Codd's paper laying out the fundamentals of relational algebra. I still haven't found "the Ur-Paper" for graph based DBMS's.

zihotkiover 3 years ago

They are good enough for general purpose databases. That's very valuable when you're in early stages of development and didn't yet consolidate your business logic. Good enough is good enough.

crabmusketover 3 years ago

Tangential question: does anyone have experience using a triplestore database in production? Or is using RDF in any way not related to marking up web pages for SEO?

lstroudover 3 years ago

Because until relatively recently, storage was the most expensive architectural component and relational databases optimized for storage.

gjvcover 3 years ago

How do NoSQL / graph databases support foreign keys or joins? Do they have the concept of referential integrity?The point of a DBMS is to do this for you, via the SQL table declarations. AIUI, NoSQL / graph databases cannot do this at the system level and needs to be done by an application-level framework.

评论 #28737012 未加载

评论 #28736505 未加载

mbravenboerover 3 years ago

(disclaimer: I am VP of Engineering at RelationalAI where we are building a graph database that uses the relational model)Thanks, this is a great question with many technical, social, and commercial aspects to it.TLDR: the relational model has a super power for data management systems: it decouples the logical from the physical representation and will eventually always win. There are technical reasons why it was hard until recently to build a graph database based on the relational model.Database were not always relational: In the 1960s databases actually had a navigational paradigm and used a hierarchical or network data model (not unlike some current graph databases).The 1970s saw the rise of relational database management systems with early proofpoints of Ingres and System R. The important improvement here was that the physical organization of data is separated from its logical organization in relations. This is the super power of the relational model. This innovation led to an explosion of commercial activity with Oracle, DB2, Sybase (licensed to become Microsoft SQL Server) and some more. Many of these are now still industry giants.The 90s was a big hype of objected-oriented programming and some got the idea that database management systems should be following the object-oriented model as well. This was mostly a catastrophic failure and instead systems based on the relational model kept improving and won in the end.In the 2000s there was a large emphasis on scalability to large numbers of users and data, and the development of NoSQL systems started. Most of these did not follow the relational model. These are the key value stores, document databases etc. Key value stores addressed the problem of poor scalability but compromised on the data model and transactions. Document databases have better locality of data and made schema changes easier. They all had something in common: compromise on the relational model to gain an advantage over existing relational systems. However, systems based on the relational model kept improving the meantime and have gradually started to gain market (or mind) share again (eg Aurora, Snowflake, Spanner, CockroachDB).In my opinion, graph databases are next. Graph databases identified a weakness, which in this case is modelling and the inability of current relational systems to handle graph structured data well. Graph data involves many joins and often recursive computations, which current commercial SQL relational systems do not do well. However, the new graph databases are not superior universally. For example, take TPC-H (OLAP - analytical) or TPC-C (OLTP - transactional), put that on a graph database and you'll typically see pretty terrible performance even though the data can easily be modeled as a graph. Several popular graph database systems do not even scale well beyond a single node.I think you are absolutely right that graph data models are easier to work with. Starting from an ER diagram (or similar) it's not straightforward to go to tables. Assuming that your ER model is a good model, you're grouping stuff into tables based on functional dependencies. Tables here are a collection of relations, eg for an _order_, the SQL table might include relations to the customer, order date, etc. It does not include relations to products included in the order, because these have a different primary key. This is difficult for users to understand.Predictably, the relational model is catching up though with graph database systems. A major research innovation in databases from recent years are better join algorithms, specifically for joins involving many relations, self-joins and skewed data. They're called worst-case optimal join algorithms (WCOJ) and several early prototype systems have shown promising results with these.Based on these ideas, RelationalAI ( <a href="https://docs.relational.ai/" rel="nofollow">https://docs.relational.ai/</a> , <a href="https://twitter.com/RelationalAI" rel="nofollow">https://twitter.com/RelationalAI</a> ) is building a graph database management system based on the relational model. Presumably, this is the relational model utilizing its super power again and demonstrating that relational models will always win and innovate to incorporate legit limitations of previous systems.

评论 #28737818 未加载

评论 #28738740 未加载

lmeyerovover 3 years ago

Graphistry gets used for all sorts of graphdb + non-graphdb data projects, which has led me to think of it sociotechnically on 2 dimensions:- Approachable semantics. Graph-y queries like untyped entity 360 views are way easier to write in graph langs. This matters for programmer adoption. But tabular systems can be simpler to implement for doing great on all sorts of other things, so other tricky tasks on SQL have gotten industry weight over decades, while the market rewarded graph query langs more for doing great on graph tasks... and just passable on non-graph.- Diverging performance for sweet spots. Graph DBs get optimized for graph queries that SQL DBs struggle at, which matters for big enterprise contracts => most revenue. There are competing graph workloads to optimize for -- handling heavy concurrent small read/write queries, big whole graph compute, fancy iterated symbolic reasoning, horizontal scaling, GPU acceleration -- and while in theory you can combine most, reality is even with VC money someone like TigerGraph gets great scaling but not say GPU speed/efficiency of cuGraph, and neo4j wins at usability. SQL engines will have better specializations for HA, GIS, time series, etc. In theory those can be special indexes in graph DBs (I think neo4j uses lucene for text indexes?), and vice versa within SQL. But practically, it takes a lot of time and $$$, so compounding engineering years fueled by enterprise $ is playing out.So in both cases, we see SQL taking compounding benefits for general purpose use, while the niche graph space goes deeper in its sweet spot. Neo4j and maybe TigerGraph might be 5-10 years away from being great general databases. With the rise of open source for specialized index types and ecosystem adapters, this is interesting to consider!So does SQL eat graph before graph eats SQL, or more timely, will the graph DB niche grow or shrink? IMO it is a quite interesting time in this space.In particular, the increase in data scales, rise of NLP knowledge graphs, and need to fuel AI/automation systems with things like 360-views of entities has grown the modern need for graph DBs. Still a small % of general DB use cases, but great wrt current enterprise/tech DB market size (1-10B?) and probably 2-3X that in 5 years. Separately, the new ubiquity of large & high-fidelity operational data (clicks, logs, transactions, sensor reads, ..) have led to a corresponding ubiquity of behavior and relationship questions. Graph intelligence includes working directly from log and SQL DB systems, which are 100X-1000X bigger than the graph db market. That has been fueling our corner of graph intelligence (viz, BI+automation, AI) space.Anyways, this adds up to Graphistry and basically all of our graph partners are hiring (we do intl remote!), and I encourage folks looking for their next thing to check all of us out :)

FinanceAnonover 3 years ago

Could it be similar to imperative vs functional languages? Imperative style being closer to the hardware instructions and having the first-mover advantage in gaining popularity, while functional style being more abstract, but trailing behind the imperative languages in adoption.