UUIDs are obsolete in the age of Docker

20 点作者 lshevtsov大约 2 年前

17 条评论

remram大约 2 年前

> this is only correct about UUID version 1. However, it is what most applications use.This is a bold claim and doesn't match my experience at all. UUIDv4 is all I see, everywhere, everyday.That's also a big enough caveat to put in the title: if you have a beef with UUIDv1, say UUIDv1 is obsolete.

评论 #35992849 未加载

ekimekim大约 2 年前

As the article points out, this is only an issue with UUIDv1. They claim "However, it is what most applications use." but I have no idea how true this is. I was under the impression that the vast majority of UUID generators were v4 by default. For example:Postgres only offers random uuid generation (<a href="https://www.postgresql.org/docs/15/functions-uuid.html" rel="nofollow">https://www.postgresql.org/docs/15/functions-uuid.html</a>).The `uuidgen` CLI tool, at least for modern versions (I have not checked historically), says (from <a href="https://man7.org/linux/man-pages/man1/uuidgen.1.html" rel="nofollow">https://man7.org/linux/man-pages/man1/uuidgen.1.html</a>): "By default uuidgen will generate a random-based UUID if a high-quality random number generator is present." (later it lists /dev/random as such a generator, present on almost all systems)What's an example of a system that generates v1 uuids by default?

评论 #35992947 未加载

评论 #35993008 未加载

SPBS大约 2 年前

1. Nobody uses UUIDv1. Why use UUIDv1 as a straw man argument?2. UUID strings are awful for storage -- don't use them. Yes there are databases that support UUIDs natively, why is whether or not a UUID fits into a machine word relevant? You use UUIDs for its other properties that 64-bit integers cannot offer. KSUIDs are touted as fixing all the aforementioned issues but they're even bigger than UUIDs.3. Both KSUIDs and UUIDs are hard for humans to read compared to 64-bit integers.4. You don't have to encode UUIDs as hexadecimal numbers plus dashes. You can choose any binary encoding you want, I am partial to Crockford Base32 because of how general-purpose it is (no vulgarities, case insensitive so it works on Windows filesystems).5. I still consider time-sortable UUID alternatives (like ULID) to be UUIDs. This article should have explicitly mentioned UUIDv1 and UUIDv4 in the title and it wouldn't have been so flamebait.

评论 #35998283 未加载

yamtaddle大约 2 年前

> If you require a globally unique string ID, consider URIsIs my knee-jerk judgement that this advice borders on nonsense, unwarranted?

评论 #35992940 未加载

majewsky大约 2 年前

Is anyone even still using non-random UUIDs? Every application I've ever seen them use is using v4.

评论 #35992882 未加载

happytoexplain大约 2 年前

Similar to other comments, I've only encountered v4 in my career. Is there a large domain where v1 is the norm that dominates the statistic, and most people happen to not work in that domain? If the author knows, I wish they'd say.

gnu8大约 2 年前

> They are awful as keys – being strings, comparisons are dramatically slower than with integers. And even if your database has a UUID type, it’s still worse because the identifier doesn’t fit into a machine word.I’m just a bit confused, a UUID is made up of hexadecimal digits, so why would it be stored as a string? It’s also 128 bits long, so it should fit into two words, excluding whatever overhead the DBMS puts on the data type, which is really their problem to worry about.

评论 #35993061 未加载

评论 #35993359 未加载

starfox64_大约 2 年前

I've had a similar issue with MongoDB's ObjectIDs. They are generated using a combination of process id, UNIX timestamp and a counter that is randomly initialized during process creation. The issue when docker comes into the mix is that the root process id of every container is 1 so a decent chunk of entropy is removed from the ObjectID. Add to that the fact that the timestamp doesn't have millisecond resolution, the only thing saving you is praying the counter of any of your processes never overlaps during the same second.It's unlikely to happen but still possible and it has brought down some of our parallel worker pool because once you have a collision, you are bound to keep generating the same id sequence until you restart your whole process to randomize the counter again.

评论 #35996648 未加载

Demiurge大约 2 年前

I've never thought UUIDv1 was useful in any virtualized context, and I hope it should be obvious, but maybe it's worth stating in the UUID generation docs. It is already explained somewhat well what the versions are in Python docs.However, with all the things already supporting UUID, I also don't see any reason to switch from UUIDv4 to anything else. I don't see how UUID, in general is obsolete, with the support it has from different libraries, and databases.

woile大约 2 年前

What about ulid as an alternative?

评论 #35993104 未加载

moltar大约 2 年前

One great benefit of UUIDs I have found is inability to join a wrong row.If you use incremental numbers, every table has 1, 2, 3.

arcticfox大约 2 年前

I was confused by this title because I only use uuid v4...the author covers that in the article, but I'm surprised that so many people use uuid v1. I thought v4 was the most popular, but that's probably just because I mostly work with my own code

fabian2k大约 2 年前

Is there any reason to use anything except completely random UUIDs? I vaguely remember reading about problems with MAC-based UUIDs decades ago, my impression was that they have been discouraged for a long time already.

halosghost大约 2 年前

> Note: this is only correct about UUID version 1. However, it is what most applications use.Okay, so, not all UUIDs, just v1. And, for some anecdata, I've actually only interacted with UUID v4 in my entire career; I don't know what the actual norm is, but I'm surprised to hear that it might still be v1.> The only other practical option is version 4 – the random UUID – but random is intuitively worse, right? Read on to find out.Oh… how is it worse?> * They are awful as keys – being strings, comparisons are dramatically slower than with integers. And even if your database has a UUID type, it’s still worse because the identifier doesn’t fit into a machine word.> * They are excessively long – each character of a UUID only encodes 3.5 bits of information if you count the dashes. That’s twice as less compared to 6 bits of Base64.Sorry, UUIDs are not strings, they're 128-bit integers. They have a standardized string representation, but if you're storing a UUID as a string, you're either being required to because your language/db/tools/etc. don't support UUIDs correctly, or you're doing it wrong.> * They are not time-ordered – despite containing a timestamp, its bits are mixed up within the UUID: the top bytes of the UUID contain the bottom bytes of the timestamp. Databases do not like an unordered primary key – it means that freshly inserted rows can go anywhere in the index. And you can’t use UUIDs for ad-hoc time sorting by time, either.This is definitely a drawback when using a UUID as a primary key, and there are alternatives for this specific use-case. However, I think the best solution I've seen to this is to use a typical 64-bit integer for the primary key, but a UUID for a user-visible ID (so that you don't leak information about the primary keys to users); this makes joins and indexes fast, but avoids the leak to the end-user.> * They are bad for human comprehension – UUIDs tend to look alike, and it’s hard to visually seek and compare them. This comes from experience.This is exactly why they shouldn't be used as an Id anywhere that a human needs to interact with one. In the above solution I mentioned, the most common ID for which you'd want to use a UUID is the user's id—the user specifically has no reason to ever refer to their or anyone else's id; they'll use the human-readable username/handle equivalent instead. And developers don't need to care about UUIDs ever because inside the db, you'd have the integer primary key that you use for joins. This seems to solve all the problems?> I kindly suggest that UUIDs are never the right answer.Honestly, I think you've only convinced me that UUID v1 is never the right answer… and I think that's mostly been true since v4 came about.All the best,-HG

评论 #35993063 未加载

WirelessGigabit大约 2 年前

Obligatory read about UUIDs derived from MAC addresses: <a href="https://devblogs.microsoft.com/oldnewthing/20040211-00/?p=40663" rel="nofollow">https://devblogs.microsoft.com/oldnewthing/20040211-00/?p=40...</a>TLDR on the article: don't use UUIDv1.Lastly, even with the best and most randomized generation, it still doesn't protect you from copy pasting: <a href="https://news.ycombinator.com/item?id=22354449" rel="nofollow">https://news.ycombinator.com/item?id=22354449</a>

coolgoose大约 2 年前

Sometimes, I am amazed about what gets on the front page of ycombinator.TLDR: Don't use UUID v1, since its entropy is based on the Mac address, if your cloud provider is generating the same mac addresses for all your containers.To say not use UUID's it makes no sense. Use UUIDv7, use them in postgres <a href="https://github.com/fboulnois/pg_uuidv7">https://github.com/fboulnois/pg_uuidv7</a> have fun :)

jupp0r大约 2 年前

In practice, I generate UUIDs entirely using entropy from /dev/random. The probability of a collision is really low for most use cases (although not if you are Google and need something unique across all database rows in your company or something similar).

评论 #35993738 未加载