TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

On Primary Keys

19 pointsby dstroyallmodelsabout 11 years ago

15 comments

outworlderabout 11 years ago
&quot;In fact, even in RDBMS’s developers should be using a natural key as their primary key instead of an auto-incremented ID. &quot;<p>No, they shouldn&#x27;t. The fact that your dataset may contain a piece of information that is usually unique its completely unrelated to the need of RDMSs to have identifiers to implement table relationships and to identify records in a table.<p>Not to mention that it can bring lots of headaches down the line. The classic example is using social security numbers (or another country&#x27;s equivalent, such as CPF in Brazil) to identify individuals. Down the line you get a requirement to support international customers. That is now a major undertaking.<p>You are also offloading the responsibility of maintaining the uniqueness property to a third party, in effect trusting their database.
评论 #7785011 未加载
评论 #7784919 未加载
tbrownawabout 11 years ago
DO NOT USE USER-SUPPLIED DATA AS KEYS<p>Database keys are more-or-less equivalent to pointers. In the case of natural keys, they&#x27;re pointers that (a) do not have ASLR, and (b) are exposed to the user.<p>As the article mostly ignores, you <i>will</i> guess wrong about what can be reasonably considered permanent or unique. Email addresses change (and aren&#x27;t always unique), names change (and aren&#x27;t unique), etc. This is much harder to fix if you used that guess as the basis for your entire data model.<p>The problem with proper keys was given as &quot;auto_increment doesn&#x27;t work on distributed systems&quot;. This is rather silly, as there are plenty of alternatives (GUID, UUID, the mentioned snowflake keys).<p>It&#x27;s also interesting that proper keys are a &quot;crutch&quot; for designing data models and are best replaced with cleverness, and apparently are only useful to dictate record order (I know that Oracle at least doesn&#x27;t work that way). This brings to mind &quot;debugging code is twice as hard as writing code; if you write code as cleverly as possible, you are by definition not clever enough to debug it&quot;. DON&#x27;T BE (OVERLY) CLEVER. Do the least-fragile thing that doesn&#x27;t add complexity (that is, doesn&#x27;t add complexity <i>visible to you</i>.). Only do clever things that add fragility if you have an actual measured reason to do so.
buckbovaabout 11 years ago
&quot;In fact, even in RDBMS’s developers should be using a natural key as their primary key instead of an auto-incremented ID. This will lead to better performance when the natural key is the most commonly used identifier, but is often not considered out of a habit.&quot;<p>As a database architect I always use auto-incrementing as the primary key. This ID is actually a surrogate key for the real primary composite key, which I assign to a unique key.<p>So, with the unique key in place, the performance will be nearly as good as you&#x27;d expect for direct looks by key attributes. However, the performance gain using the surrogate ID within join tables and as foreign keys is much greater.<p>&quot;It’s also worth noting that over their years of use in SQL databases they’ve become somewhat of a crutch when designing the data model for a system.&quot;<p>They are not a crutch. They are a wonderful feature. One could model with GUIDs if necessary or something similar for your surrogate key but nothing beats the simplicity of an auto incrementing ID.<p>Even a NoSQL database like MongoDB creates a unique _id for you for each document.
Jeddabout 11 years ago
An article promising to be quite technical demonstrates early on the author&#x27;s inability to get their head around the spelling of the possessive form of &quot;its&quot;. (Perhaps it&#x27;s just me that finds this an unsettling juxtaposition. The About page wasn&#x27;t useful in identifying native writer status.)<p>Then we dive into some mistruths about emails being unique per user (I know plenty of people who <i>still</i> share email addresses, both personal and also in shared commercial accounts). I worked at a place a few years ago that had designed their entire accounts system around the idea of email addresses being immutable. They&#x27;d subsequently had to introduce a secondary column for <i>actual</i> email addresses -- customers would login with their historical email address as their ID. The idea that a uid is a &#x27;natural key&#x27; just seems perverse (or at least demonstrates a profound misunderstanding of what a natural key is).<p>A sideways reference - without actually using the phrase - to CAP theorem. Composite keys are suggested as a solution to a vaguely phrased problem without acknowledging the performance issues, and subsequent design constraints, they introduce. Ultimately there seems to be a misunderstanding about the differences between a (primary) key and an index.
al2o3crabout 11 years ago
The biggest problem with &quot;natural&quot; keys, as referenced in the article: they have a nasty habit of <i>changing</i>. In the article&#x27;s example, making &quot;username&quot; or &quot;email&quot; the PK is going to be messy if you have foreign keys referencing the value when it updates.
评论 #7784048 未加载
评论 #7784042 未加载
评论 #7784344 未加载
mobiuscogabout 11 years ago
In a recent large project, I moved to using GUIDs. Following advice from this article: <a href="http://www.codeproject.com/Articles/388157/GUIDs-as-fast-primary-keys-under-multiple-database" rel="nofollow">http:&#x2F;&#x2F;www.codeproject.com&#x2F;Articles&#x2F;388157&#x2F;GUIDs-as-fast-pri...</a> it was fairly straightforward and performant.<p>Obviously YMMV (as may your definition of &#x27;large&#x27; and &#x27;performant&#x27;).
评论 #7785041 未加载
bcoughlanabout 11 years ago
&quot;In fact, even in RDBMS’s developers should be using a natural key as their primary key instead of an auto-incremented ID. This will lead to better performance when the natural key is the most commonly used identifier, but is often not considered out of a habit.&quot;<p>Primary&#x2F;unique keys are indexed. Creating an index of strings instead of integers is surely going to result in a huge index and slow performance?
评论 #7784268 未加载
cratermoonabout 11 years ago
We&#x27;ve been having this argument about primary keys for three decades at least. If the natural key vs. auto-generated ID theories can&#x27;t be reduced to practice, is there any home for software development as a profession?
aaronemabout 11 years ago
&gt; An email address is a piece of information that is guaranteed to be unique to each user because everybody has their own email address.<p>I once let myself in for a lot of headaches by designing a system around this unwarranted assumption.
bcbrownabout 11 years ago
It looks like the purpose of the article is to explain why it&#x27;s not a big deal that the company&#x27;s database solution doesn&#x27;t support auto-incrementing primary keys.
dpeckabout 11 years ago
Instagram had a nice post on how they did their PK generation&#x2F;sharding a couple years back. Still a good read, <a href="http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram" rel="nofollow">http:&#x2F;&#x2F;instagram-engineering.tumblr.com&#x2F;post&#x2F;10853187575&#x2F;sha...</a>.<p>Previously discuss here, <a href="http://news.ycombinator.com/item?id=3058327" rel="nofollow">http:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=3058327</a>
dbrowerabout 11 years ago
There&#x27;s nothing that requires most auto id keys to be monotonically sequential, which is the bottleneck when distributed. If uniqueness is all that&#x27;s required, blocks of id&#x27;s can be handed to different hosts and replenished on demand.<p>Gap-free, monotonically increasing id&#x27;s should only be used for things that have regulatory compliance issues. For example, there shall be no gaps in invoice id&#x27;s, and voided ones shall be marked.
brandtgabout 11 years ago
Small project that may be of interest to those involved in this discussion: <a href="https://github.com/brandtg/alicia" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;brandtg&#x2F;alicia</a><p>Kind of &quot;eventually monotonic&quot; distributed auto-increment keys. Your mileage may vary.
campbellsoupabout 11 years ago
Could someone point out to me how filtering and selecting records over some parts of the primary key (ex {sensorID} over {deviceID}_{sensorID}_{timestamp}) is better than just having regular indexed fields (ex {sensorID}) containing those pieces of information?
评论 #7784181 未加载
TheLoneWolflingabout 11 years ago
Slightly off-topic:<p>With JS disabled, the page header is white text on a transparent background.<p>With JS enabled, I see no way to remove the header. I have little enough vertical screen real estate, adding an extra chunk of stuff on the top doesn&#x27;t help. I ended up removing it entirely.