Goodbye integers, hello UUIDv7

726 pointsby juanfatasover 1 year ago

44 comments

jonhohleover 1 year ago

This is great for internal distributed systems where having ordered keys is useful, however, it should probably be noted that these probably shouldn't be used as public identifiers (even though this will probably be the defacto standard and used publicly without thought).Having any information, specifically time information, leaking from your systems may or may not have unanticipated security or business implications. (e.g. knowing when session tokens or accounts are created).

评论 #37734024 未加载

评论 #37734487 未加载

评论 #37734500 未加载

评论 #37740893 未加载

评论 #37739469 未加载

评论 #37739205 未加载

评论 #37738133 未加载

评论 #37736216 未加载

评论 #37734563 未加载

评论 #37734011 未加载

评论 #37735682 未加载

评论 #37739894 未加载

评论 #37736298 未加载

rockwotjover 1 year ago

I find it interesting that it’s quoted random IDs are bad for performance, because it’s actually better for distributed storage systems because you don’t hotspot on a single node. For example see: <a href="https://stackoverflow.com/a/53901549" rel="nofollow noreferrer">https://stackoverflow.com/a/53901549</a> and <a href="https://medium.com/google-cloud/cloud-spanner-choosing-the-right-primary-keys-cd2a47c7b52d" rel="nofollow noreferrer">https://medium.com/google-cloud/cloud-spanner-choosing-the-r...</a>

评论 #37733384 未加载

评论 #37733376 未加载

评论 #37734277 未加载

评论 #37733375 未加载

评论 #37734584 未加载

评论 #37733567 未加载

评论 #37737192 未加载

评论 #37734108 未加载

评论 #37733480 未加载

评论 #37733465 未加载

评论 #37734792 未加载

评论 #37735810 未加载

评论 #37733350 未加载

phkahlerover 1 year ago

Is there some reason new versions of UUID keep appearing? It seems like the desired properties are never quite achieved so new ones appear later. Is there a table with UUID version across the top and characteristics down the side, so I can see the differences and pick one that fits my needs? That might also help to explain why there are so many variants.

评论 #37737760 未加载

评论 #37737568 未加载

评论 #37740541 未加载

评论 #37743951 未加载

andersaover 1 year ago

> We use sequential primary keys for efficient indexing, and UUID secondary keys for external use. The upcoming UUIDv7 standard offers the best of both worldsUnless you consider users being able to extract the generation time from the id to be an issue, of course.

评论 #37733551 未加载

评论 #37737039 未加载

pyrolisticalover 1 year ago

And you can use it today with Postgres uuid type. Postgres doesn’t care what you store in it as long as it has the correct length. So you can generate a uuidv7 and store it natively

评论 #37733563 未加载

评论 #37733558 未加载

评论 #37733362 未加载

dajonkerover 1 year ago

Similar to the old situation in the article, we are using sequential 64 bit primary keys, but we use an additional random 64 bit key for external usage (instead of 128 bit).The external key is base64 encoded for use in URLs which results in an 11 byte string.This hides any information about the size of the data, the creation date of customer accounts (which would be sort of visible with UUIDv7) and prevents anyone from attempting to enumerate data by changing the integer in URLs.I thought about using UUIDs as external keys but the only compelling use case seems to be the ability to generate keys from many decoupled sources that have to be merged later.64 bit should be enough for most things <a href="https://youtu.be/gocwRvLhDf8?si=QBheJCG21bAAV0Z7" rel="nofollow noreferrer">https://youtu.be/gocwRvLhDf8?si=QBheJCG21bAAV0Z7</a>

评论 #37741114 未加载

评论 #37737030 未加载

评论 #37737343 未加载

jimmySixDOFover 1 year ago

Discussion here a couple months ago :Analyzing New Unique Identifier Formats (UUIDv6, UUIDv7, and UUIDv8) (2022) <a href="https://news.ycombinator.com/item?id=36438367">https://news.ycombinator.com/item?id=36438367</a>

jiggawattsover 1 year ago

It seems insane to me to “validate” GUIDs/UUIDs.Half the point of these things is that they’re treated as opaque identifiers.

评论 #37734275 未加载

评论 #37734093 未加载

评论 #37734075 未加载

LAC-Techover 1 year ago

Why use UUIDv7 over ULIDs?As Lazare points out in this thread they're basically the same thing, except with ULIDs you get those 6 extra bits of randomness back that UUIDs have to use for metadata.

评论 #37734762 未加载

评论 #37736198 未加载

dkubbover 1 year ago

If it helps anyone, at work, I open sourced the UUID v7 postgresql function that I wrote: <a href="https://github.com/Betterment/postgresql-uuid-generate-v7">https://github.com/Betterment/postgresql-uuid-generate-v7</a>We've seen some amazing benefits, especially around improving the speed of batch inserts.

jakewinsover 1 year ago

A useful/horrifying pattern on this topic: you can use UUIDv1 as a prefixed id, giving you a way to generate tagged IDs in a system that uses UUIDs.You set the node field to a broadcast MAC address, and use that as a namespace/prefix. This inches close to the boundary of the RFC, but is arguably compliant.As an example, you may generate demo or “canary” data items that are UUIDv1s with a well known node field, which then lets you do distributed “isDemoData()” checks by just looking at the UUID.

amanziover 1 year ago

Can you take the first portion of the UUIDv7 string, and decode it to figure out the exact date and time that record was created? I'm wondering if there might be security/privacy concerns in some situations if the UUID codes are visible in your app?

评论 #37733919 未加载

评论 #37733733 未加载

Lazareover 1 year ago

UUIDv7 is a nice idea, and should probably be what people use by default instead of UUIDv4 for internal facing uses.For the curious:* UUIDv4 are 128 bits long, 122 bits of which are random, with 6 bits used for the version. Traditionally displayed as 32 hex characters with 4 dashes, so 36 alphanumeric characters, and compatible with anything that expects a UUID.* UUIDv7 are 128 bits long, 48 bits encode a unix timestamp with millisecond precision, 6 bits are for the version, and 74 bits are random. You're expected to display them the same as other UUIDs, and should be compatible with basically anything that expects a UUID. (Would be a very odd system that parses a UUID and throws an error because it doesn't recognise v7, but I guess it could happen, in theory?)* ULIDs (<a href="https://github.com/ulid/spec">https://github.com/ulid/spec</a>) are 128 bits long, 48 bits encode a unix timestamp with millisecond precision, 80 bits are random. You're expected to display them in Crockford's base32, so 26 alphanumeric characters. Compatible with almost everything that expects a UUID (since they're the right length). Spec has some dumb quirks if followed literally but thankfully they mostly don't hurt things.* KSUIDs (<a href="https://github.com/segmentio/ksuid">https://github.com/segmentio/ksuid</a>) are 160 bits long, 32 bits encode a timestamp with second precision and a custom epoch of May 13th, 2014, and 128 bits are random. You're expected to display them in base62, so 27 alphanumeric characters. Since they're a different length, they're not compatible with UUIDs.I quite like KSUIDs; I think base62 is a smart choice. And while the timestamp portion is a trickier question, KSUIDs use 32 bits which, with second precision (more than good enough), means they won't overflow for well over a century. Whereas UUIDv7s use 48 bits, so even with millisecond precision (not needed) they won't overflow for something like 8000 years. We can argue whether 100 years is future proof enough (I'd argue it is), but 8000 years is just silly. Nobody will ever generate a compliant UUIDv7 with any of the first several bits aren't 0. The only downside to KSUIDs is the length isn't UUID compatible (and arguably, that they don't devote 6 bits to a compliant UUID version).Still feels like there's room for improvement, but for now I think I'd always pick UUIDv7 over UUIDv4 unless there's an very specific reason not to. Which would be, mostly, if there's a concern over potentially leaking the time the UUID was generated. Although if you weren't worrying about leaking an integer sequence ID, you likely won't care here either.

评论 #37734125 未加载

评论 #37734115 未加载

erik_seabergover 1 year ago

> first component (prefix) of the identifier is a sortable timestamp> values generated are practically sequentialThese statements aren’t strict enough to be relied on. Maybe you have engineered the hell out of your distributed clock scheme, and your IDs actually are completely monotonic, which is great. But you probably haven’t done that, which means conflicts will surely happen and you must handle them gracefully.

评论 #37736275 未加载

评论 #37733929 未加载

评论 #37734041 未加载

评论 #37734070 未加载

评论 #37734045 未加载

评论 #37733901 未加载

评论 #37734782 未加载

wvhover 1 year ago

A few years back, I wrote some code that generates a sortable 128-bit UUID-like identifier starting with a milliseconds-since-epoch timestamp, a node number and a random byte tail. It has been working fine in Postgresql, using its builtin UUID type. I suppose downstream system have been using the string representation though. The main reason for going such an identifier was being able to generate them from different, non-centralised places. A nice side effect is that you can't accidentally get an erroneous ID that happens to work the way you can with a sequential integer primary key.For another project, I've also used sortable 64-bit snowflake-like identifiers; they have the added benefit of being able to use 64-bit integer representation in code and database identifiers, even if you might want to externally represent them in base58 or similar encoding.The original UUID types aren't as useful as they once were, so it'd be worth writing a new RFC and extending those original types.

评论 #37736238 未加载

toni88xover 1 year ago

IMO the benefit of UUIDs over integers is that they can be generated client-side without clashing. But you cannot trust timestamps generated by clients and therefore the order. So what is the benefit over UUID4?

评论 #37738287 未加载

评论 #37736974 未加载

Traubenfuchsover 1 year ago

Am I the only one instinctively upset by the communication/bandwith/storage overhead of the dashes as well as the version and variant bits of UUIDs?It might be insignificant, but to me it makes UUID feel tainted, dirty. 11.1% of a UUID are dashes. 15.3% of a UUID are wasted bits if you count version and variant bits.Anecdote: I worked for a company that used numeric primary ids internally and externally and increased the primary key by TWO to THREE for each new customer to make it appear to the outside world we had twice to three times the rate of customer growth.

评论 #37734768 未加载

评论 #37734560 未加载

user3939382over 1 year ago

It’s nice for front end state. You post the new entity, the front provides the ID, and as long as you get a 200 you can update your state, or update optimistically and roll it back. You don’t need to wait for the API to figure out what your ID is.

tzahifadidaover 1 year ago

Not specifically the topic, but I looked for a library for golang and it is not that common, there is a library in <20 stars, too experimental for me. Also, not sure the postgresql extension is in the main distribution, couldn't find it if it does. For example, GCP only supports this one IIUC <a href="https://www.postgresql.org/docs/current/uuid-ossp.html" rel="nofollow noreferrer">https://www.postgresql.org/docs/current/uuid-ossp.html</a> Java has something, but again not really clear how tested. So using this is a bit iffy...

dgb23over 1 year ago

As a beginner I treated and understood (SQL) databases as something I have to use in order to store stuff.Later I was excited about the power and expressiveness of SQL and its extensions. There is a ton of leverage and you can make it so that interfacing with it directly becomes much more useful.However now I’m in a different phase. I see it as a durable data structure. I think in terms of “what does it provide to make the overall system better?”The issues around indexing and uuids that is discussed in the article fits nicely into this line of thinking.In web development, database access and performance often dominates and infects the whole system.

评论 #37735546 未加载

0pteronover 1 year ago

Is it not the case that having 128 bit primary keys take up 4 times as much memory as 32 bit integers when keeping the indices in RAM? I guess if you need the index to be clustered by time and also need a unique identifier in most queries then UUIDv7 fits your use-case but I still think having integer for the primary key will fit most use cases and be more efficient

rockwotjover 1 year ago

The first time I heard about ordered string IDs was Firebase’s push IDs. They had an interesting solution to also address time skew to get better ordering for drivers: <a href="https://firebase.blog/posts/2015/02/the-2120-ways-to-ensure-unique_68/" rel="nofollow noreferrer">https://firebase.blog/posts/2015/02/the-2120-ways-to-ensure-...</a>

jugover 1 year ago

Haha this is what we came up with for our home brewn unique ID's in a GIS application since decades ago. For the same reasons.

declan_robertsover 1 year ago

It’s 2023. Why aren’t we using more characters from the utf-8 keyspace to make things like UUIDs use less characters?

评论 #37733657 未加载

评论 #37733648 未加载

评论 #37734729 未加载

评论 #37733829 未加载

评论 #37733980 未加载

评论 #37734097 未加载

评论 #37734025 未加载

wolverine876over 1 year ago

> the random nature of standard non-time-ordered UUIDs (such as v4) can create database performance problems when used as primary keys. This problem is often referred to as poor database index locality.Couldn't that be solved with incremented serial numbers, rather than leaking time data?

评论 #37740521 未加载

评论 #37741572 未加载

insanitybitover 1 year ago

> The nature of Buildkite's products mean recent data is accessed more frequently than old data. With non-sequential identifiers, the most recent data will be randomly dispersed within an index and lack clusteringI would assume that `serial` would solve this problem too.

mooreedover 1 year ago

Feels like a spiritual successor to the ksuid [1] lib which I first heard of used in conjunction with DynamoDB[1]: <a href="https://github.com/segmentio/ksuid">https://github.com/segmentio/ksuid</a> which has very similar use cases.

miiiiiikeover 1 year ago

This is neat. I've been using a custom snowflake cluster for years. Having this in the language/DB would be great for smaller projects.For bigger/public projects I'd like to be able to add a sequence, node, and data center id to the UUID too.

pknerdover 1 year ago

Speaking of RDBMs, how good are UUIDs when making joins and fetching a certain record?

评论 #37738565 未加载

gwbas1cover 1 year ago

Anyone ever try encrypting a database ID (IE, a sequential int,) and use that as a public key?IE, take a 32 or 64 bit int that's the primary key, encrypt it, and then use that as the public ID in a web application, URL, API, ect.

tzahifadidaover 1 year ago

To me it sounds like a corner case. Example:a) UUID4, CreatedTime/UpdatedTime.b) Bigint, CreatedTime/UpdatedTime.c) UUID7 internal (which also includes time badly), UUID4 external/whatever short ID.How exactly this helps if you need external ids (which you usually do today)? It doesn't even make it a short ID.Even if there is a corner case, are we just saving a few bytes while adding more complication?Clustered Index is a myth in PostgreSQL, not practical since you have to run a special program to reorder. So, a regular index might suffer but not really. Why? Because I am not ordering by the ID most of the time, I am ordering by "Created Date/Updated Date" or Name or whatever. Who cares about ordering IDs?WAIT!!! But what about Next Tokens? ok, these are painful, but easily solved: Next can be (>=Created Date,>ID). Same result. Pagination, stays the same since it is sorted by Created Date.

评论 #37734650 未加载

评论 #37734578 未加载

hknmttover 1 year ago

I have been using ULID for years. Using digits now would feel very strange.

Pxtlover 1 year ago

Frustrating, I looked up MS/C#'s implementation and they don't get stored in a proper semisequential fashion in MS SQL Server because MS stores UUIDs in an odd binary format.

eviksover 1 year ago

Always wondered what the point of dash-separating uuid if the separated parts are unreadable anyway just like in this version, just makes it harder to select as a single blob of text

danweeover 1 year ago

So, how do you guys use UUIDs for real? I worked in a company in which they were using UUIDs in Mongo, and of the most painful things were implementing API endpoints that filter resources. Imagine you have an endpoint in which you are filtering by resources A, B, C and D. Ideally you would end up with something like this:<pre><code> GET /filter?a_id=X&b_id=Y&c_id=Z&d_id=w </code></pre> But in practice we were using POST and passing the ids in the body payload. Why Because my old team said "the UUIDs are long, so we may reach the maximum URL length if we pass them as parameters". I didn't like it, and I still don't like it at all.

评论 #37735288 未加载

评论 #37735221 未加载

coolgooseover 1 year ago

I am confused how this is new. UUIDv1 is time based, you just need to be careful about entropy, and in MySQL 8 you can for a longish time use it as an ordered field.

评论 #37734307 未加载

zooFoxover 1 year ago

One benefit of an epoch is that it's easily readable (or comparable, at the very least). I am not sure I can read epoch in hexadecimal format though.

评论 #37736329 未加载

perfmodeover 1 year ago

I chose ULIDs for a recent project.Hope it won’t bite me in the future.

xaropeover 1 year ago

I am just about to wrap up some prototyping comparing snowflake, typeids, uuidv4 and ulid. Why did I not bump into uuidv7 earlier?!?

评论 #37736243 未加载

samatmanover 1 year ago

Relying on timestamps to be sortable, when clock skew and ntd guarantee that they won't always be, strikes me as poor design.If you need to sort by insert order, use an autoincrementing integer, if you need uniqueness, UUIDv4 is fine, if you need both use both.Use timestamps when you need to record the time, just don't commit the sin of presuming that clock time will never run backwards, I assure you, it does.

评论 #37738765 未加载

评论 #37738806 未加载

jsf01over 1 year ago

How long will it be before the “milliseconds since epoch” part of the uuid overflows or repeats?

评论 #37735060 未加载

markcollinover 1 year ago

Interesting - have beem using uuidv4 for a long time. Will explore further on uuidv7

dataangelover 1 year ago

why bother with any version of the uuid standard? just generate a random 128-bit number and use it. that's all the newer ones are anyway

评论 #37734438 未加载

评论 #37734449 未加载

JCharanteover 1 year ago

I wonder who this article is written for. Who would be reading about UUIDs but not know about cache hit rates?> As a result, retrieving the most recent data from a large dataset will require traversing a large number of database index pages, leading to a poor cache hit ratio (how many requests a cache is able to fill successfully, compared to how many requests it receives).

44 comments

jonhohleover 1 year ago

评论 #37734024 未加载

评论 #37734487 未加载

评论 #37734500 未加载

评论 #37740893 未加载

评论 #37739469 未加载

评论 #37739205 未加载

评论 #37738133 未加载

评论 #37736216 未加载

评论 #37734563 未加载

评论 #37734011 未加载

评论 #37735682 未加载

评论 #37739894 未加载

评论 #37736298 未加载

rockwotjover 1 year ago

评论 #37733384 未加载

评论 #37733376 未加载

评论 #37734277 未加载

评论 #37733375 未加载

评论 #37734584 未加载

评论 #37733567 未加载

评论 #37737192 未加载

评论 #37734108 未加载

评论 #37733480 未加载

评论 #37733465 未加载

评论 #37734792 未加载

评论 #37735810 未加载

评论 #37733350 未加载

phkahlerover 1 year ago

评论 #37737760 未加载

评论 #37737568 未加载

评论 #37740541 未加载

评论 #37743951 未加载

andersaover 1 year ago

评论 #37733551 未加载

评论 #37737039 未加载

pyrolisticalover 1 year ago

And you can use it today with Postgres uuid type. Postgres doesn’t care what you store in it as long as it has the correct length. So you can generate a uuidv7 and store it natively

评论 #37733563 未加载

评论 #37733558 未加载

评论 #37733362 未加载

dajonkerover 1 year ago

评论 #37741114 未加载

评论 #37737030 未加载

评论 #37737343 未加载

jimmySixDOFover 1 year ago

jiggawattsover 1 year ago

It seems insane to me to “validate” GUIDs/UUIDs.Half the point of these things is that they’re treated as opaque identifiers.

评论 #37734275 未加载

评论 #37734093 未加载

评论 #37734075 未加载

LAC-Techover 1 year ago

Why use UUIDv7 over ULIDs?As Lazare points out in this thread they're basically the same thing, except with ULIDs you get those 6 extra bits of randomness back that UUIDs have to use for metadata.

评论 #37734762 未加载

评论 #37736198 未加载

dkubbover 1 year ago

jakewinsover 1 year ago

amanziover 1 year ago

评论 #37733919 未加载

评论 #37733733 未加载

Lazareover 1 year ago

评论 #37734125 未加载

评论 #37734115 未加载

erik_seabergover 1 year ago

评论 #37736275 未加载

评论 #37733929 未加载

评论 #37734041 未加载

评论 #37734070 未加载

评论 #37734045 未加载

评论 #37733901 未加载

评论 #37734782 未加载

wvhover 1 year ago

评论 #37736238 未加载

toni88xover 1 year ago

评论 #37738287 未加载

评论 #37736974 未加载

Traubenfuchsover 1 year ago

评论 #37734768 未加载

评论 #37734560 未加载

user3939382over 1 year ago

tzahifadidaover 1 year ago

dgb23over 1 year ago

评论 #37735546 未加载

0pteronover 1 year ago

rockwotjover 1 year ago

jugover 1 year ago

Haha this is what we came up with for our home brewn unique ID's in a GIS application since decades ago. For the same reasons.

declan_robertsover 1 year ago

It’s 2023. Why aren’t we using more characters from the utf-8 keyspace to make things like UUIDs use less characters?

评论 #37733657 未加载

评论 #37733648 未加载

评论 #37734729 未加载

评论 #37733829 未加载

评论 #37733980 未加载

评论 #37734097 未加载

评论 #37734025 未加载

wolverine876over 1 year ago

评论 #37740521 未加载

评论 #37741572 未加载

insanitybitover 1 year ago

mooreedover 1 year ago

miiiiiikeover 1 year ago

pknerdover 1 year ago

Speaking of RDBMs, how good are UUIDs when making joins and fetching a certain record?

评论 #37738565 未加载

gwbas1cover 1 year ago

tzahifadidaover 1 year ago

评论 #37734650 未加载

评论 #37734578 未加载

hknmttover 1 year ago

I have been using ULID for years. Using digits now would feel very strange.

Pxtlover 1 year ago

Frustrating, I looked up MS/C#'s implementation and they don't get stored in a proper semisequential fashion in MS SQL Server because MS stores UUIDs in an odd binary format.

eviksover 1 year ago

Always wondered what the point of dash-separating uuid if the separated parts are unreadable anyway just like in this version, just makes it harder to select as a single blob of text

danweeover 1 year ago

评论 #37735288 未加载

评论 #37735221 未加载

coolgooseover 1 year ago

I am confused how this is new. UUIDv1 is time based, you just need to be careful about entropy, and in MySQL 8 you can for a longish time use it as an ordered field.

评论 #37734307 未加载

zooFoxover 1 year ago

One benefit of an epoch is that it's easily readable (or comparable, at the very least). I am not sure I can read epoch in hexadecimal format though.

评论 #37736329 未加载

perfmodeover 1 year ago

I chose ULIDs for a recent project.Hope it won’t bite me in the future.

xaropeover 1 year ago

I am just about to wrap up some prototyping comparing snowflake, typeids, uuidv4 and ulid. Why did I not bump into uuidv7 earlier?!?

评论 #37736243 未加载

samatmanover 1 year ago

评论 #37738765 未加载

评论 #37738806 未加载

jsf01over 1 year ago

How long will it be before the “milliseconds since epoch” part of the uuid overflows or repeats?

评论 #37735060 未加载

markcollinover 1 year ago

Interesting - have beem using uuidv4 for a long time. Will explore further on uuidv7

dataangelover 1 year ago

why bother with any version of the uuid standard? just generate a random 128-bit number and use it. that's all the newer ones are anyway

评论 #37734438 未加载

评论 #37734449 未加载

JCharanteover 1 year ago