Things to know about databases

730 点作者 grech将近 3 年前

23 条评论

This article is informative. I have found that databases in general tend to be less sexy than the front-end apps...especially with the recent cohort of devs. As an old bastard, I would pass on one thing: Realize that any reasonably used database will likely outlast the applications leveraging it. This is especially true the bigger it gets, and the longer it stays in production. That said, if you are influencing the design of a database, imagine years later what someone looking at it might want to know if having to rip all the data out into some other store. Having migrated many legacy systems, I tend to sleep better when I know the data is well-structured and easy to normalize. In those cases, I really don't care so much about the apps. If I can sort out (haha) the data, I worry less about the new apps I need to design. I have been known to bury documentation into for-purpose tables...that way I know that info won't be lost. Export the schema regularly, version it, check it in somewhere. And, if you can, please, limit the use of anything that can hold a NULL. Not every RDBMS handles NULL the same way. Big old databases live a looooong time.

评论 #31900780 未加载

评论 #31898422 未加载

评论 #31898789 未加载

评论 #31902981 未加载

评论 #31901658 未加载

评论 #31898610 未加载

评论 #31902367 未加载

评论 #31901268 未加载

评论 #31903661 未加载

评论 #31902601 未加载

评论 #31918939 未加载

yla92将近 3 年前

Great post. Also highly recommend Designing Data-Intensive Applications by Martin Kleppmann (<a href="https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321" rel="nofollow">https://www.amazon.com/Designing-Data-Intensive-Applications...</a>). The sections on "Storage and Retrieval", "Replication", "Partitioning" and "Transactions" really opened up my eyes!

评论 #31896245 未加载

评论 #31897488 未加载

tiffanyh将近 3 年前

#1 thing you should know, RDBMS can solve pretty much every data storage/retrieval problem you have.If you're choosing something other than an RDBMS - you should rethink why.Because unless you're at massive scale (which still doesn't justify it), choosing something else is rarely the right decision.

评论 #31898111 未加载

评论 #31898604 未加载

评论 #31897780 未加载

评论 #31897988 未加载

评论 #31897673 未加载

评论 #31898889 未加载

评论 #31898141 未加载

评论 #31897812 未加载

评论 #31898003 未加载

评论 #31905274 未加载

评论 #31897356 未加载

评论 #31904684 未加载

评论 #31901623 未加载

评论 #31897554 未加载

评论 #31905465 未加载

评论 #31898990 未加载

Merad将近 3 年前

> a dirty read occurs when you perform a read, and another transaction updates the same row but doesn't commit the work, you perform another read, and you can access the uncommitted (dirty) valueIt's even worse than this with MS SQL Server. When using the READ UNCOMMITTED isolation level it's actually possible to read corrupted data, e.g. you might read a string while it's being updated, so the result row you get contains a mix of the old value and new value of the column. SQL Server essentially does the "we got a badass over here" Neil deGrasse Tyson meme and throws data at you as fast as it can. Unfortunately I've worked on several projects where someone apparently thought that READ UNCOMMITTED was a magic "go fast" button for SQL and used it all throughout the app.

评论 #31901086 未加载

AtNightWeCode将近 3 年前

Not sure how to use these recommendations in practice though even if the info is somewhat correct. SQL is a beast of tech and it is used because of battle history and since there is simply no other viable tech replacing it when it comes to transactions and aggregated queries.Indexes are a nightmare to get right. Often performance optimizations of SQL databases include removing indexes as much as adding indexes.

评论 #31897391 未加载

评论 #31897369 未加载

donatj将近 3 年前

I still think about my first job out of college. Shopping cart application, we would add indexes exclusively when there was a problem rather than proactively based on expected usage patterns. It's genuinely a testament to MySQL that we got as far as we did without knowing anything about what we were doing.One of my most popular StackOverflow questions to this day is about how to handle one million rows in a single MySQL table (shudder).The product I work on now collects more rows than that a day in a number of tables.

mjb将近 3 年前

Introductory material is always welcome, but I suspect this isn't going to hit the target for most people. For example:> Therefore, if the price isn’t an issue, SSDs are a better option — especially since modern SSDs are just about as reliable as HDDsThis needs a tiny extra bit of detail: if you're buying random IO (IOPS) or throughput (MB/s), SSDs are significantly (orders of magnitude!) cheaper than HDDs. HDDs are only cheaper on space, and only if your need for throughput or IO doesn't cause you to "strand" space.> Consistency can be understood after a successful write, update, or delete of a row. Any read request immediately receives the latest value of the row.This isn't the ACID definition of C, and is closer to the distributed systems (CAP) one. I can't fault the article for getting this wrong, though - it's super confusing!

评论 #31897467 未加载

thedougd将近 3 年前

I have to plug the "Designing Data-Intensive Applications" book. It dives deep into the inner workings of various database architectures.<a href="https://dataintensive.net/" rel="nofollow">https://dataintensive.net/</a>

wrs将近 3 年前

From the SERIALIZABLE explanation: “The database runs the queries one by one … It is essential to have some retry mechanism since queries can fail.”I know they’re trying to simplify, but this is confusing. If the first part is true, the second part can’t be. In reality the database does execute the queries concurrently, but will try to make it seem like they were done one by one. If it can’t manage that, a query will fail and have to be retried by the application.

评论 #31896536 未加载

bironran将近 3 年前

Nice post, though for the indexing "introduction-deep-dive" I would still recommend newbies to look at <a href="https://use-the-index-luke.com/" rel="nofollow">https://use-the-index-luke.com/</a> .

评论 #31898394 未加载

评论 #31896826 未加载

jwr将近 3 年前

Some of the explanations are questionable: I think they were overly simplified, and while I applaud the goal, some things just aren't that simple.I highly recommend reading <a href="https://jepsen.io/consistency" rel="nofollow">https://jepsen.io/consistency</a> and clicking on each model on the map. This is the best resource I found so far for understanding databases, especially distributed ones.

评论 #31897561 未加载

评论 #31897049 未加载

galaxyLogic将近 3 年前

<a href="https://github.com/prql/prql" rel="nofollow">https://github.com/prql/prql</a> :" Unlike SQL, it forms a logical pipeline of transformations, and supports abstractions such as variables and functions. It can be used with any database that uses SQL, since it transpiles to SQL. "

jandrewrogers将近 3 年前

> "Scale of data often works against you, and balanced trees are the first tool in your arsenal against it."An ironic caveat to this is that balanced trees don't scale well, only offering good performance across a relatively narrow range of data size. This is a side-effect of being "balanced", which necessarily limits both compactness and concurrency.That said, concurrent B+trees are an absolute classic and provide important historical context for the tradeoffs inherent in indexing. Modern hardware has evolved to the point where B+trees will often offer disappointing results, so their use in indexing has dwindled with time.

评论 #31897829 未加载

评论 #31908230 未加载

评论 #31897791 未加载

jrm4将近 3 年前

To go big picture; I'm kind of glad databases are largely like cars in this respect, in ways that other software tooling isn't.Which is to say they're frequently good enough such that the human working with them on whatever level can safely not know a lot of these details and get a LOT done. Kudos to whoever deserves them here.

评论 #31897893 未加载

googletron将近 3 年前

This is a quick rundown of database indexes and transactions. Excited to continue sharing these notes with community!

评论 #31896203 未加载

trhoad将近 3 年前

An interesting subject! The article could do with an edit, however. There are lots of grammatical errors.

molly0将近 3 年前

Anyone read this pdf/book <a href="https://sql-performance-explained.com" rel="nofollow">https://sql-performance-explained.com</a> and would recommend?

r0b05将近 3 年前

Nicely written and informative!

评论 #31896752 未加载

manish_gill将近 3 年前

What tool was used to create the visuals?

评论 #31902559 未加载

sonofacorner将近 3 年前

This is great. Thanks for sharing!

dennalp将近 3 年前

Really nice guide.

otherflavors将近 3 年前

why is this tagged "MySQL" but not also "SQL"

评论 #31896633 未加载

throwaway787544将近 3 年前

Can anyone give me a brief understanding of stored procedures and when I should use them?