TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Probability of GUID collisions with different versions

93 pointsby yanowitzover 9 years ago

6 comments

TazeTSchnitzelover 9 years ago
UUIDs and GUIDs are far too complicated, personally I don&#x27;t like using them. There are multiple &quot;versions&quot; (really, generation algorithms) of UUID and GUID, each with their own problems:<p>* Some types of UUID uniquely identify the machine they were generated on (one version contains the MAC address + current time, another contains the POSIX UID&#x2F;GID + domain name) - this got Microsoft into hot water in the 1990s when Word added GUIDs to documents, which meant you could trace documents Stasi-style<p>* Some types of UUID are based on insecure hashing algorithms (MD5 and SHA1)<p>* Some types of UUID are namespaced, because everything needs namespaces, obviously<p>* There&#x27;s a specially reserved type for Microsoft to use for special COM objects<p>There&#x27;s only one mode you should actually use, which is the random bits.<p>UUIDs and GUIDs also have a weird spacing of dashes. You&#x27;d expect three dashes, splitting it into a sequence of 4-byte chunks, but no: it&#x27;s split into 4-2-2-2-6, for some reason. And which chunk it is matters, because different chunks <i>have different endianness</i>. Some of them are considered numbers, some of them bytes, even though they all look the same. Some of them have special significance (it contains two different version numbers!), with no especially obvious rhyme or reason to their placement. Oh, and the endianness is implementation-defined: GUIDs are partly &quot;native&quot; endian (usually little-endian, then), partly big-endian, whereas UUIDs are <i>typically</i> big-endian. How do you tell them apart? Well, GUIDs are <i>usually</i> written in capitals, and UUIDs are <i>usually</i> written in lowercase.<p>I just use 16 random bytes encoded in hexadecimal, separated by three dashes at 4-byte increments. No hashing algorithms, versions, endianness issues, namespacing, severe privacy problems, just random bytes. It&#x27;s not only simpler, it has more bits of entropy, and is easier to generate.
评论 #10925385 未加载
评论 #10925462 未加载
评论 #10926440 未加载
评论 #10925156 未加载
gus_massaover 9 years ago
&gt; <i>GUID generation algorithm 4 fills the GUID with 122 random bits. The odds of two GUIDs colliding are therefore one in 2^122, which is a phenomenally small number.</i><p>Another thing to consider, is that due to the birthday paradox once you build 2.7 * 10^18 GUID, the probability that you have at lest a collision is bigger than 50%. And 2.7 * 10^18 is only 2^61.2.
评论 #10925221 未加载
评论 #10925225 未加载
评论 #10925395 未加载
评论 #10925160 未加载
评论 #10925601 未加载
评论 #10925158 未加载
dspillettover 9 years ago
<i>&gt; If you use the European scale.</i><p>Be careful there. Parts of Europe including here (the UK) officially use short scale like the US.<p>When looking at historical figures it is important to be extra careful as scale use has flipped back and forth over time in places. &quot;Historical&quot; doesn&#x27;t go as far back as you think either: short scale becoming the standard number naming convention in the UK happened in 1974 so there are still people alive who use long scale and remember it being the most common form.<p>To <i>really</i> confuse things some places use a half-way house of &quot;short scale with milliard&quot;...<p>It is safer to stick with &quot;scientific&quot; prefixes (kilo, mega, giga, tera, ...) as they are consistently interpreted the same way except where someone is being deliberately difficult (for &quot;deliberately difficult&quot; read &quot;just plain wrong&quot;). It sometimes sounds odd referring to things like &quot;giga pounds&quot; instead of &quot;billions of pounds&quot; (or &quot;thousand millions of pounds&quot; or &quot;milliards of pounds&quot;) but it reduces the risk of misinterpretation and anyone who doesn&#x27;t understand probably wouldn&#x27;t truly understand any of the above terms without explanation.<p>For more see <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Long_and_short_scales" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Long_and_short_scales</a>
dblohm7over 9 years ago
I&#x27;m amazed that in all of these discussions nobody ever references the RFC, so here you go: <a href="http:&#x2F;&#x2F;www.ietf.org&#x2F;rfc&#x2F;rfc4122.txt" rel="nofollow">http:&#x2F;&#x2F;www.ietf.org&#x2F;rfc&#x2F;rfc4122.txt</a>
bhoustonover 9 years ago
I can deal with a machine crashing, I can not deal with an invalid database caused by GUIDs intersecting. Thus I actually want to avoid GUID intersections more than I care about a single machine&#x27;s valid ram.
cekover 9 years ago
GUID == UUID. Annoys me to this day that MS uses the term GUID. Windows includes API functions using both names (e.g. CoCreateGuid and UuidCreate).
评论 #10924908 未加载
评论 #10924996 未加载
评论 #10924915 未加载