> Suppose that you assigned everyone an 19 digit number. What is the probability that two human beings would have the same number?<p>If <i>I</i> assigner the numbers, I would carefully print 8 billion or so slips with different numbers, and place them in an urn. For each person, I'd draw from the urn, and then the probability of <i>me</i> assigning the same number to two humans is related to IT failures during generation. How many dollar bills share a serial number unintentionally?
> If you want the probability to be effectively zero, you should use 30 digits or so.<p>You really don't need it to be effectively zero. Just significantly less than other sources of error. For 8 billion people, a 3% chance of there being single error somewhere on the whole planet is pretty good on the scale of issues that the QA team needs deal with. Especially when the fix is: Roll a second random number for that single unlucky person on Earth. That's pretty easily auto-detected and auto-fixed at generation time.<p>I worked on a big game that used 32-bit hashes of asset names even though we expected to get on the order of 10 collisions across the dataset. The solution was to detect collisions and tell artists to tweak their file names. Happened about 10 times over the course of many years and hundreds of thousands of assets.
UUID to the rescue. <a href="https://en.wikipedia.org/wiki/Universally_unique_identifier" rel="nofollow">https://en.wikipedia.org/wiki/Universally_unique_identifier</a><p>It:<p>- is a standard<p>- adoption is widespread in computing platforms<p>- addresses the birthday problem (in v4, chances of a collision in a 103 trillion set is 1 in a billion)
Why 19 is important: log(2^63) ~= 19, which means a random 64-bit integer is not long enough to uniquely identify all human beings. A 128-bit integer or UUID is
The social security administration in the US solved this long ago. Reserve a few bits for “which source generated this integer”, then go sequential, or random w/o replacement for each shard.<p>(Note that SSN’s were not meant to be unique when this scheme was invented. They were designed to be reused periodically. Name, DOB and SSN should be unique though.)
The Birthday Paradox and Microsoft GUIDs or How I use Mathematica to Reassure Myself to Go Back to Sleep at Nights: <a href="https://www.atriumtech.com/pongskorn/birthdayparadox/birthdayparadox.htm" rel="nofollow">https://www.atriumtech.com/pongskorn/birthdayparadox/birthda...</a>
Given the requirement to detect collisions and a process to reissue an ID for various legal reasons, I don't see a problem. In fact, don't use too many digits. If you use too many digits, collisions will be so rare that when they occur nobody will know how to handle them. Better to have a process that gets exercised a few times a year at minimum.
The actual title is “19 random digits is not enough to uniquely identify all human beings”.<p>The tl;dr is that you need at least 30 digits so it’s safe to assign a random number to a person with a close to zero probability of already being assigned.<p>I’m not really sure why the author is talking about 19 digits. Must be a reference to something I guess but I really don’t know what.<p>Also this doesn’t mention for how long this is valid for. New people keep getting born, so at some point 30 numbers won’t be enough unless we don’t care about reuse of the ones where people might be long gone.
there are great solutions like <a href="https://en.m.wikipedia.org/wiki/Snowflake_ID" rel="nofollow">https://en.m.wikipedia.org/wiki/Snowflake_ID</a> if you give up randomness
Another advantage to greatly-overeingineered random string lengths is that it makes brute-force namespace search technically infeasible. This is an increaasing problem with PSTN (public switched telephone networks) where there are simply too few digits in a phone number to prevent comprehensive dialing attacks. (Number reuse would be another principle problem.)<p>Long ago I'd read a Douglas Hofstadter essay (probably from his <i>Scientific American</i> puzzles column and compiled in <i>Metamagical Themas</i>) where he'd commented on the apparent idiocy of having very long account numbers which were clearly far larger than the possible object space.<p>That critique fails to consider any number of points, including the challenges of non-coordinated UUID assignments (as Lemire writes), the practice of coding other semantic information into parts of a larger string (e.g., branch or office identifiers, years, or other redundant information), systems which have grown out of mergers of multiple independent systems (where accounts from systems A and B might have collided, so A and B now require distinguishing, as well as C, D, E, ...), and as I've noted, the ease of searching the namespace for valid / assigned values.
This is the most dystopian thing I've read in a while. Prison Planet Earth, where everyone is uniquely identified by a number, perhaps surgically implanted on an RFID tag? How much compute resource would be needed to create an AI minder for each and every human on top of that, constantly updating their social credit score on a daily basis?