Ask HN: SQL or NoSQL?

16 pointsby barnabeesover 4 years ago

Just came across this tweet criticizing the Parler social network for using a relational database:https://twitter.com/sarahmei/status/1348477224467394560My understanding was always that for relational data (e.g., social networks) you should use a relational database. Is the person in this tweet correct? If so what is a better option?

14 comments

emgoover 4 years ago

Don't worry too much about this tweet, my guess is that the author wanted to express a strong opinion to provoke a reaction form a certain audience.Once you reach a large scale, relational databases start being a problem for availability and replication of data across different availability zones. Operations become complicated (you have replication chains, master/slave setups, etc.)If your data is relatively simple and doesn't require a lot of relations and foreign keys, then something like Cassandra can save a lot of headaches.Btw, a common trick to make a relational database perform at scale by limiting joins is to "flatten data", i.e. replicate data across different tables to avoid joining them.Finally, don't let yourself be fooled by anyone who claims they know "the better option." There is no better option. There is only a better option for a particular use case you're looking at, given the specific constraints at hands. That's what engineering is about, including software engineering.If you want to learn more about designing storage systems by constraints, I recommend that you read the 2007 Dynamo paper from Amazon, and in particular section 2.3 "Design Considerations". Below is a link, you can easily find a PDF online if you need.<a href="https://www.allthingsdistributed.com/2007/10/amazons_dynamo.html" rel="nofollow">https://www.allthingsdistributed.com/2007/10/amazons_dynamo....</a>

bluefirebrandover 4 years ago

I no longer buy the "use relational databases for relational data, use NOSQL for non-relational data" mentality.Basically all meaningful data in an application context has relationships. There is no real such thing as "non-relational data"Instead the question really is do you want a planned, enforced schema or an unplanned, freeform one.Use a SQL database for the former.Take a long look in a mirror and question the decisions that made you the way you are if the latter.

评论 #26163221 未加载

评论 #26170556 未加载

评论 #26192401 未加载

speedgooseover 4 years ago

There is some truth in this tweet but it doesn't mean you should use a nosql document database.Storing more context in document helps obviously because you don't have to fetch the data many times, it's actually also done in relational databases whenever needed. But you can't store a lot in one document, that doesn't scale nor work.For example, if someone changes its avatar or want to delete its account, do you want to parse all your social network documents to update an avatar or remove the comments on a tiny subset of them ? If a post is popular, are you going to update its document thousands of times per second ?In practice you will most likely find a mix of everything. Relational databases, in memory data stores, cache layers, perhaps a few nosql documents database, some big data stuff and a probably some excel sheets.

评论 #26192788 未加载

madhadronover 4 years ago

It's not about NoSQL vs SQL. Facebook's Tao is still backed by MySQL, so it's not like there's some intrinsic limitation. The issues are number of records examined to return a result, lock contention, sharding, and replication/consistency. NoSQL databases generally trade some of the conveniences of relational to be able to provide stronger properties in these aspects.The limitations that Sarah Mei identifies as clownpants is using a 32 bit primary key for an identifier for an ephemeral thing. That is again nothing to do with SQL vs NoSQL. It would affect both of them the same way.

评论 #26192449 未加载

codingdaveover 4 years ago

Unless you've seen their code and their data structures, we don't know the impact on performance of their technical choices. I would say that there is no black and white answer of what type of product needs what type of database -- it all depends on how you design the solution. I'd also venture to say that with so many databases now supporting JSON as a native data type, you can blend relational and non-relational data as needed within a relational DB.

edhelasover 4 years ago

I'm using NoSQL only if I feel that I can't do it properly in a SQL database.So far I never used NoSQL.

aaccountover 4 years ago

All ways use SQLNoSQL is for incompetent people who can't figure out how to convert a JSON request to a table structure. They just put the entire JSON as it is in a DB and call it NOSQL.Anyone using NoSQL for anything is either lying or clueless.

评论 #26163865 未加载

roperzhover 4 years ago

Martin Kleppmann's "Designing Data-Intensive Applications" discusses this <a href="https://dataintensive.net/" rel="nofollow">https://dataintensive.net/</a>Besides being a good read overall, the book discusses topics like this one in detail and with a healthy attitude (people tend to have strong opinions on this)

johnisgoodover 4 years ago

> My understanding was always that for relational data (e.g., social networks) you should use a relational database.I thought you were supposed to use a graph database for that, like dgraph. Do I remember incorrectly?> Dgraph is a horizontally scalable and distributed GraphQL database with a graph backend.---Edit: found the source... According to <a href="https://www.infoworld.com/article/3251829/why-you-should-use-a-graph-database.html" rel="nofollow">https://www.infoworld.com/article/3251829/why-you-should-use...</a>:"However, as with any popular technology, there can be a tendency to apply graph databases to every problem. It’s important to make sure that you have a use case that is a good fit. For example, graphs are often applied to problem domains like:- Social networks- Recommendation and personalization- Customer 360, including entity resolution (correlating user data from multiple sources)- Fraud detection- Asset management"

评论 #26171051 未加载

openlowcodeover 4 years ago

I think the choice boils down to a few questions:- do you need relational data, or something more simple, or something more flexible ?- do you need transaction integrity ? Transaction integrity is a nice feature, but you can also design all your code so that if something blows "in the middle", it is somehow repaired automatically in a further event.Maybe a third point: most of our relational / transactional database technology is quite old. Could we do something better than SQL query language, common database types, and the actual database code that was very optimized for magnetic spinning disks, but maybe is not optimized for SSD ? Maybe, we would need something like SQLV2.And my god how much hype bullshit is inserted in those technical discussions.

评论 #26192479 未加载

markus_zhangover 4 years ago

I think it always originates from business analysis requirements. Do you have some analysis that could be difficult to perform if using X? If it is then maybe switch to Y, or find a balance, or even build duplicates.

simplermanover 4 years ago

Nine joins is not a big deal, you simply filter data before joining. And that is only if tables are fully normalized, no one does that. For example, current avatar and user info maybe in same table. Post and permission will likely be in the same table.Of course, you will use materialized views for even better performance.

psmithsfhnover 4 years ago

whenever someone makes blanket technical statements in this crazy boasting fashion, i think of Yeats:> ...the worst are full of passionate intensity.that said, it's difficult to feel sympathy for people supporting a platform that encourages terrorism, murder, etc.

databrechtover 4 years ago

Social media are typically quite heavy on tree traversals. That kind of pattern is very similar to trying to resolve a deep ORM query or a deep GraphQL query which also doesn't map very well on 'traditional' relational databases <a href="https://en.wikipedia.org/wiki/Object%E2%80%93relational_impedance_mismatch" rel="nofollow">https://en.wikipedia.org/wiki/Object%E2%80%93relational_impe...</a>. I believe this 'issue' depends on: A) the type of join B) whether your relational databases flattens between consecutive joins. C) is there easy/efficient pagination on multiple levelsThe type of join shouldn't be a problem, SQL engines should in most cases be able to determine the best join. In the cases it can't you can go start tweaking (although tricky to get right, especially if your data evolves, it's possible, you probably want to fix your query plan). B is however tricky and a performance loss since it's really a bit silly that data is flattened into a set each time to be then (probably) put into a nested (Object-Oriented or JSON) format to provide the data to the client. This is closely related to C, in a social graph you might have nodes (popular people or tweets) who have a much higher amount of links than others. That means if you do a regular join on tweets and comments and sort it, on the tweet you might not get beyond the first person. Instead, you probably only want the first x comments. That query might result in an amount of nested groups. So it might look more like the following SQL (wrote it by heart, probably not correct):SELECT tweet.*, jsonb_agg(to_jsonb(comment)) ->> 0 as comments, FROM tweet JOIN comment ON tweet.id = comment.tweet_idGROUP BY tweet.id HAVING COUNT(comment.tweet_id) < 64 LIMIT 64That obviously becomes increasingly complex if you want a feed with comments, likes, retweets, people, etc.. all in one. There are reasons why two engineers that helped to scale twitter create a new database (<a href="https://fauna.com/" rel="nofollow">https://fauna.com/</a>) where I work. Although relational, the relations are done very differently. Instead of flattening sets, you would essentially walk the tree and on each level join. I did an attempt to explain that here for the GraphQL case: <a href="https://www.infoworld.com/article/3575530/understanding-graphql-engine-implementations.html?page=2" rel="nofollow">https://www.infoworld.com/article/3575530/understanding-grap...</a>TLDR, in my opinion you can definitely use a traditional relational database. But it might not be the most efficient choice due to the impedance mismatch. Relational applies to more than traditional SQL databases though, graph database or something like fauna is also relational and would be a better match (Fauna is similar in the sense that joins are very similar to how a graph database does these). Obviously I'm biased though since I work for Fauna.