TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A humble guide to database schema design

414 点作者 helloiloveyou大约 5 年前

29 条评论

earthboundkid大约 5 年前
Most of this advice is stuff they tell you in school that doesn’t matter at all in reality.<p>Real advice:<p>- Normalize until it hurts, denormalize until it works. (I.e. prefer normal forms but denormal will have better performance.)<p>- You need nullable dates and nullable numbers. You never need a nullable string. If you must distinguish blank string from unset, add a column called “unset”. Using null for this will just result in bad data entry.<p>- Names should be stored as display_name, first_name, and last_name. A display name is not the same as just first plus last. They have different purposes, so don’t shove them together.<p>- When in doubt, just have a column full of JSON and figure it out later.<p>- Offset pagination is bad for performance. Avoid it at all costs unless you want some spider indexing page 10,432 of your archives to eat all your CPU.
评论 #22811453 未加载
评论 #22810039 未加载
评论 #22810848 未加载
评论 #22810129 未加载
评论 #22810751 未加载
评论 #22810449 未加载
评论 #22812277 未加载
评论 #22810948 未加载
tasuki大约 5 年前
&gt; Never store firstname and lastname in the same field<p>I disagree. Why would you want to force the &quot;firstname lastname&quot; pattern on people? What if someone has three names? Or only one? Just use a &quot;name&quot; field!
评论 #22810081 未加载
评论 #22809026 未加载
评论 #22807836 未加载
评论 #22809104 未加载
评论 #22809078 未加载
评论 #22807747 未加载
评论 #22807651 未加载
评论 #22820678 未加载
评论 #22811382 未加载
评论 #22810778 未加载
ebg13大约 5 年前
Design guidance aside, this is not a good article because it doesn&#x27;t show how to fix the presented problem scenarios.<p>&quot;This is not first normal form&quot; - OK, how would you fix it?<p>&quot;This is not second normal form&quot; - OK, how would you fix it?<p>&quot;This is neither second nor third normal form&quot; - OK, how would you fix it?<p>If you&#x27;re going to show problems, show solutions as well.<p>The rest is a big red flag list of falsehoods that programmers believe about names and addresses.
评论 #22808005 未加载
评论 #22807867 未加载
评论 #22808381 未加载
tomlagier大约 5 年前
If you&#x27;re dealing with international addresses, point 3 becomes very challenging. The &#x27;tokens&#x27; of an address are called different things everywhere, take different forms, and sometimes don&#x27;t make much sense to compare. Figuring out a good balance of usability and generality can be really tricky.<p>Here&#x27;s nearly 100 pages on the subject (from the perspective of addressing mail for USPS): <a href="http:&#x2F;&#x2F;www.columbia.edu&#x2F;~fdc&#x2F;postal&#x2F;" rel="nofollow">http:&#x2F;&#x2F;www.columbia.edu&#x2F;~fdc&#x2F;postal&#x2F;</a>
评论 #22807422 未加载
评论 #22807248 未加载
评论 #22812883 未加载
评论 #22809065 未加载
评论 #22807680 未加载
评论 #22811943 未加载
me_bx大约 5 年前
Off-topic comment &#x2F; feedback about the blog post:<p>The author self-describes himself as &quot;_Trusted_ Consultant&quot; in the blog&#x27;s subtitle.<p>The comic strips illustrating the article do not mention about the permission from identified license provider(s), and are cropped, does not help building trust - at least in my case.<p>Are they used with authorization from their authors?
heisenzombie大约 5 年前
This person would disagree about point 4: <a href="https:&#x2F;&#x2F;www.kalzumeus.com&#x2F;2010&#x2F;06&#x2F;17&#x2F;falsehoods-programmers-believe-about-names&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.kalzumeus.com&#x2F;2010&#x2F;06&#x2F;17&#x2F;falsehoods-programmers-...</a>
评论 #22806793 未加载
boffinism大约 5 年前
In what way is this guide &#x27;humble&#x27;? Seems fairly self-confidently opinionated to me.
评论 #22806636 未加载
评论 #22807004 未加载
LoSboccacc大约 5 年前
&gt; Never store firstname and lastname in the same field<p>funny, where would you store the middle name then? and the second middle name? married name in cultures where the spouse changes or double it? of course in normal form you need a name table, with an order column, and so on and so long.<p>anyway, this article goes straight to the notions everyone can fetch on any book and skip the interesting and diffiylcult part about database design, which is understanding your domain so that no future expansion is prevented by a wrong cardinality on a relation that was simplified from reality.<p>because if it&#x27;s a user database you might get away with just the name, so you can address people during support calls. if it&#x27;s an accounting database, you might want to know someone name at the time of invoicing, having multiple records with start and end validity times, and if it&#x27;s a FBI database for protected identities you might have to have multiple name aliases referring to a single person.
评论 #22808078 未加载
atwebb大约 5 年前
Since everyone has to have an opinion and provide it, here&#x27;s mine, maybe a retitle?<p>&gt;A humble guide to transactional database schema design<p>There are a number of things that don&#x27;t apply to analytical database design and I&#x27;m seeing the two confused fairly often, then again, this comes entirely from my point of view so take it for what it&#x27;s worth.
davedx大约 5 年前
I work on an e-commerce platform that&#x27;s live in about 20 countries. One of the hardest records to normalize correctly is the address, because not only do users have very different rules depending on their country (e.g. not all addresses in Turkey have post codes, GB has some very weird building names&#x2F;numbers...), we also connect with vendor API&#x27;s that have their <i>own</i> ideas of how to model an address.<p>My personal opinion on this is something like address_line_1, address_line_2, city, region, post_code, with each field being varchar(max). Anyone else have an opinion on how to do this?
评论 #22811287 未加载
评论 #22811436 未加载
winrid大约 5 年前
Recently I ran into a project where everything is stored in around 3k unique fields, with many levels of nesting, in a JSONB column with NO schema or documentation. Generated JSDOC from the data and I&#x27;m hoping to create some kind of schema.<p>You can use JS on the server. Just don&#x27;t do it this way ...
ghufran_syed大约 5 年前
“Beginning database design” By Clare Churcher is a really accessible book on these issues, very practical (starts off with simple spreadsheet tables!) and takes you through step by step to understanding the problems with and the need for the different levels of normalization.
gigatexal大约 5 年前
I’m just here with popcorn and for the comments.<p>Lucky for me SQL just clicked and when I wrote my first schema third normal form is how it came out more or less so I knew I was on to something.<p>I would say a study of how the internals of your databases of choice works would help in schema design.
评论 #22812129 未加载
DeathArrow大约 5 年前
That&#x27;s one issue I see with NoSql, data is not normalized and consistency goes out of the window if database is big enough and the volume of data is big enough. That might not matter if you work on a search engine, social network, music suggestion app, but for most business apps and for most websites, having inconsistency is really bad.<p>On the other hand, having some data denormalized can improve performance as you can avoid expensive joins. It&#x27;s for designer to see if they need to trade consistency for performance and in what measure. I&#x27;d avoid it, I&#x27;d prefer to add more servers, use load balancing, use caches.
jonathanstrange大约 5 年前
In my programming life I&#x27;ve barely used SQL databases so far, and there is something I&#x27;ve never understood about these recommendations. Maybe someone here can explain that to me.<p>I thought databases are designed to organize the data optimally internally and execute all operations on them as efficiently as possible. Why do you then need to organize databases so stringently? Doesn&#x27;t that mean that there is something wrong with how the db works?<p>To me the advice always seemed similar to bit twiddling instead of relying on the compiler to optimize. Shouldn&#x27;t the database lay out the data optimally internally?
评论 #22812563 未加载
评论 #22811439 未加载
评论 #22811423 未加载
collyw大约 5 年前
My humble opinion is that poor choices in database design are the source of a great deal of technical debt.<p>I used to avoid making schema changes as they were &quot;scary&quot; (until I got a decent understanding of Django Migrations). Just a quick hack to make up for it in the application layer was far easier. Then another, and another until you have real mess on your hands.<p>And avoid nulls when you can. Unfortunately most of the time I have had a new database design there is existing data in excel sheets that is messy and you can&#x27;t avoid it.
评论 #22812348 未加载
FpUser大约 5 年前
I&#x27;ve got 2 things to say:<p>1) There is no &quot;good design&quot;. It is always a compromise of trying to satisfy various practical constraints. OLAP vs OLTP probably rings a bell here. And this Last &#x2F; First name thing is laughable. The complexity of handling all possible human names can be insane. Depending on the goal the proper design can be anything starting from single field and up to some relational structure.<p>2) The article itself looks like let&#x27;s just write something up to score some points.
yellowapple大约 5 年前
&gt; 4) Never store firstname and lastname in the same field<p>I have the opposite opinion: unless you have a specific reason why you need to know one&#x27;s &quot;first name&quot; and &quot;last name&quot; (e.g. to interface with some other system that requires names to be split up like that), just use a single field and stop caring about whether or not the name conforms to your preconceived notions of format and structure.
emmelaich大约 5 年前
(fwiw, I like the article, and I find it suitably humble)<p>Just want to hang a thought here ... depending on the precise definition of First Normal Form, it is either impossible to avoid[0] or impossible to enforce[1] without further constraints.<p>[0] if definition is &quot;must be rectangular&quot;, i.e. all columns have the same number of rows and all rows have the same number of columns.<p>[1] if the definition includes &quot;do not allow duplicate rows&quot;
reader_1000大约 5 年前
The Data Model Resource Book [1] is a good resource how to design database. It is a kind of reference book<p>[1] <a href="https:&#x2F;&#x2F;www.wiley.com&#x2F;en-us&#x2F;The+Data+Model+Resource+Book%2C+Volume+1%3A+A+Library+of+Universal+Data+Models+for+All+Enterprises%2C+Revised+Edition-p-9780471380238" rel="nofollow">https:&#x2F;&#x2F;www.wiley.com&#x2F;en-us&#x2F;The+Data+Model+Resource+Book%2C+...</a>
rukuu001大约 5 年前
This article is perfect if you&#x27;re just getting started.<p>There&#x27;s lore around design tricks to get the most out of a database, but it&#x27;s often implementation or domain specific.<p>You&#x27;ll learn what rules to break, and why, as you solve real life production problems.<p>FWIW - you can go a long, long, long way before de-normalising is actually necessary.
DeathArrow大约 5 年前
I wish there will be a DBMS which satisfy all aspects of CAP theorem and also provide high performance.<p>Until then, we are stuck with RDBMS and NoSql as lesser evils based on the use cases.
评论 #22810579 未加载
paulmendoza大约 5 年前
Make the primary key the thing you query by. Treat a sql database as similar to a nosql database (no joins) and your performance will be amazing.
SPBS大约 5 年前
RE: not storing address in a single field, splitting an address into semantic fields feels like a premature optimisation. You’re forced to anticipate all the fields that you may want to aggregate by, before you may actually even use them.<p>Instead, you could just create a custom function that takes in a string and tries to identify the city or street number or whatever element you want extract from the address e.g.<p><pre><code> select * from users where extract_city(address) = &#x27;london&#x27;;</code></pre>
paulmendoza大约 5 年前
Use the array data type for simple collections if using Postgres because it saves a join.
DeathArrow大约 5 年前
An advice I think is worth adding to the list is: use indexing.
banq大约 5 年前
This a DDD aggregates design guide, not database schema
darth_avocado大约 5 年前
Use third normal form. - Till your system gets 10 year old and linking your user&#x27;s first name to their address requires joins on 10 tables having millions of rows.
awaythrower大约 5 年前
Firstname lastname is too fragile and oversimplified. Suffix, prefix and a variable number of middle names&#x2F;initials can also be necessary. Occasionally, legal name vs. nickname is also needed. But if there&#x27;s no pressing need for grouping or sorting by surname, why piecemeal any of it out?