On Primary Keys

19 pointsby dstroyallmodelsabout 11 years ago

15 comments

outworlderabout 11 years ago

"In fact, even in RDBMS’s developers should be using a natural key as their primary key instead of an auto-incremented ID. "No, they shouldn't. The fact that your dataset may contain a piece of information that is usually unique its completely unrelated to the need of RDMSs to have identifiers to implement table relationships and to identify records in a table.Not to mention that it can bring lots of headaches down the line. The classic example is using social security numbers (or another country's equivalent, such as CPF in Brazil) to identify individuals. Down the line you get a requirement to support international customers. That is now a major undertaking.You are also offloading the responsibility of maintaining the uniqueness property to a third party, in effect trusting their database.

评论 #7785011 未加载

评论 #7784919 未加载

tbrownawabout 11 years ago

DO NOT USE USER-SUPPLIED DATA AS KEYSDatabase keys are more-or-less equivalent to pointers. In the case of natural keys, they're pointers that (a) do not have ASLR, and (b) are exposed to the user.As the article mostly ignores, you will guess wrong about what can be reasonably considered permanent or unique. Email addresses change (and aren't always unique), names change (and aren't unique), etc. This is much harder to fix if you used that guess as the basis for your entire data model.The problem with proper keys was given as "auto_increment doesn't work on distributed systems". This is rather silly, as there are plenty of alternatives (GUID, UUID, the mentioned snowflake keys).It's also interesting that proper keys are a "crutch" for designing data models and are best replaced with cleverness, and apparently are only useful to dictate record order (I know that Oracle at least doesn't work that way). This brings to mind "debugging code is twice as hard as writing code; if you write code as cleverly as possible, you are by definition not clever enough to debug it". DON'T BE (OVERLY) CLEVER. Do the least-fragile thing that doesn't add complexity (that is, doesn't add complexity visible to you.). Only do clever things that add fragility if you have an actual measured reason to do so.

buckbovaabout 11 years ago

"In fact, even in RDBMS’s developers should be using a natural key as their primary key instead of an auto-incremented ID. This will lead to better performance when the natural key is the most commonly used identifier, but is often not considered out of a habit."As a database architect I always use auto-incrementing as the primary key. This ID is actually a surrogate key for the real primary composite key, which I assign to a unique key.So, with the unique key in place, the performance will be nearly as good as you'd expect for direct looks by key attributes. However, the performance gain using the surrogate ID within join tables and as foreign keys is much greater."It’s also worth noting that over their years of use in SQL databases they’ve become somewhat of a crutch when designing the data model for a system."They are not a crutch. They are a wonderful feature. One could model with GUIDs if necessary or something similar for your surrogate key but nothing beats the simplicity of an auto incrementing ID.Even a NoSQL database like MongoDB creates a unique _id for you for each document.

Jeddabout 11 years ago

An article promising to be quite technical demonstrates early on the author's inability to get their head around the spelling of the possessive form of "its". (Perhaps it's just me that finds this an unsettling juxtaposition. The About page wasn't useful in identifying native writer status.)Then we dive into some mistruths about emails being unique per user (I know plenty of people who still share email addresses, both personal and also in shared commercial accounts). I worked at a place a few years ago that had designed their entire accounts system around the idea of email addresses being immutable. They'd subsequently had to introduce a secondary column for actual email addresses -- customers would login with their historical email address as their ID. The idea that a uid is a 'natural key' just seems perverse (or at least demonstrates a profound misunderstanding of what a natural key is).A sideways reference - without actually using the phrase - to CAP theorem. Composite keys are suggested as a solution to a vaguely phrased problem without acknowledging the performance issues, and subsequent design constraints, they introduce. Ultimately there seems to be a misunderstanding about the differences between a (primary) key and an index.

al2o3crabout 11 years ago

The biggest problem with "natural" keys, as referenced in the article: they have a nasty habit of changing. In the article's example, making "username" or "email" the PK is going to be messy if you have foreign keys referencing the value when it updates.

评论 #7784048 未加载

评论 #7784042 未加载

评论 #7784344 未加载

mobiuscogabout 11 years ago

In a recent large project, I moved to using GUIDs. Following advice from this article: <a href="http://www.codeproject.com/Articles/388157/GUIDs-as-fast-primary-keys-under-multiple-database" rel="nofollow">http://www.codeproject.com/Articles/388157/GUIDs-as-fast-pri...</a> it was fairly straightforward and performant.Obviously YMMV (as may your definition of 'large' and 'performant').

评论 #7785041 未加载

bcoughlanabout 11 years ago

评论 #7784268 未加载

cratermoonabout 11 years ago

We've been having this argument about primary keys for three decades at least. If the natural key vs. auto-generated ID theories can't be reduced to practice, is there any home for software development as a profession?

aaronemabout 11 years ago

> An email address is a piece of information that is guaranteed to be unique to each user because everybody has their own email address.I once let myself in for a lot of headaches by designing a system around this unwarranted assumption.

bcbrownabout 11 years ago

It looks like the purpose of the article is to explain why it's not a big deal that the company's database solution doesn't support auto-incrementing primary keys.

dpeckabout 11 years ago

Instagram had a nice post on how they did their PK generation/sharding a couple years back. Still a good read, <a href="http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram" rel="nofollow">http://instagram-engineering.tumblr.com/post/10853187575/sha...</a>.Previously discuss here, <a href="http://news.ycombinator.com/item?id=3058327" rel="nofollow">http://news.ycombinator.com/item?id=3058327</a>

dbrowerabout 11 years ago

There's nothing that requires most auto id keys to be monotonically sequential, which is the bottleneck when distributed. If uniqueness is all that's required, blocks of id's can be handed to different hosts and replenished on demand.Gap-free, monotonically increasing id's should only be used for things that have regulatory compliance issues. For example, there shall be no gaps in invoice id's, and voided ones shall be marked.

brandtgabout 11 years ago

Small project that may be of interest to those involved in this discussion: <a href="https://github.com/brandtg/alicia" rel="nofollow">https://github.com/brandtg/alicia</a>Kind of "eventually monotonic" distributed auto-increment keys. Your mileage may vary.

campbellsoupabout 11 years ago

Could someone point out to me how filtering and selecting records over some parts of the primary key (ex {sensorID} over {deviceID}_{sensorID}_{timestamp}) is better than just having regular indexed fields (ex {sensorID}) containing those pieces of information?

评论 #7784181 未加载

TheLoneWolflingabout 11 years ago

Slightly off-topic:With JS disabled, the page header is white text on a transparent background.With JS enabled, I see no way to remove the header. I have little enough vertical screen real estate, adding an extra chunk of stuff on the top doesn't help. I ended up removing it entirely.