New UUID Formats – IETF Draft

525 pointsby anuragsonialmost 4 years ago

35 comments

pjscottalmost 4 years ago

A somewhat oversimplified summary of the new UUID formats:UUID6: a timestamp with a weird epoch and 100 ns precision like in UUID1, but in a big-endian order that sorts naturally by time, plus some random bits instead of a predictable MAC address.UUID7: like UUID6, but uses normal Unix timestamps and allows more timestamp precision.UUID8: like UUID7, but relaxes requirements on where the timestamp is coming from. Want to use a custom epoch or NTP timestamps or something? UUID8 allows it for the sake of flexibility and future-proofing, but the downside is that there's no standard way to parse the time from one of these -- the time source could be anything monotonic.

评论 #28092737 未加载

评论 #28090252 未加载

评论 #28090582 未加载

评论 #28091680 未加载

评论 #28090246 未加载

评论 #28093551 未加载

LukeShualmost 4 years ago

As a quick reference for readers:The UUID specs use the terms "variant" and "version" a little funny; "variant" is essentially the revision (so all modern UUIDs have variant=0b10 to specify RFC 4122 UUIDs), and "version" is a 4-bit number identifying the sub-type within that variant:<pre><code> UUIDv1 time-based UUIDv2 legacy DCE security thing, not wideley used UUIDv3 name-based with md5 hashing UUIDv4 randomness-based UUIDv5 name-based with sha1 hashing </code></pre> This draft registers a few new sub-types:<pre><code> UUIDv6 sortable time-based, Gregorian calendar UUIDv7 sortable time-based, Unix time UUIDv8 sortable time-based, custom time</code></pre>

评论 #28091683 未加载

评论 #28091684 未加载

nabla9almost 4 years ago

The background section gives reasons derived from looking at existing implementations.---Due to the shortcomings of UUIDv1 and UUIDv4 details so far, many widely distributed database applications and large application vendors have sought to solve the problem of creating a better time- based, sortable unique identifier for use as a database key. This has lead to numerous implementations over the past 10+ years solving the same problem in slightly different ways.- Timestamps MUST be k-sortable. That is, values within or close to the same timestamp are ordered properly by sorting algorithms.- Timestamps SHOULD be big-endian with the most-significant bits of the time embedded as-is without reordering.- Timestamps SHOULD utilize millisecond precision and Unix Epoch as timestamp source. Although, there is some variation to this among implementations depending on the application requirements.- The ID format SHOULD be Lexicographically sortable while in the textual representation.- IDs MUST ensure proper embedded sequencing to facilitate sorting when multiple UUIDs are created during a given timestamp.- IDs MUST NOT require unique network identifiers as part of achieving uniqueness.- Distributed nodes MUST be able to create collision resistant Unique IDs without a consulting a centralized resource.

rootusrootusalmost 4 years ago

"UUIDv8 SHOULD only be utilized if an implementation cannot utilize UUIDv1, UUIDv6, or UUIDv8." I assume that is a typo.

评论 #28090251 未加载

评论 #28090261 未加载

jozvolskyefalmost 4 years ago

What are the advantages of sortable UUIDs with embedded timestamps over random 128 bits and a created_at column?

评论 #28089991 未加载

评论 #28090055 未加载

评论 #28089587 未加载

评论 #28089590 未加载

评论 #28089798 未加载

评论 #28091431 未加载

评论 #28089959 未加载

评论 #28089701 未加载

评论 #28089560 未加载

评论 #28090510 未加载

评论 #28090066 未加载

评论 #28089557 未加载

评论 #28090575 未加载

评论 #28089949 未加载

评论 #28108245 未加载

评论 #28089568 未加载

评论 #28089898 未加载

评论 #28089987 未加载

评论 #28090029 未加载

评论 #28090513 未加载

评论 #28090364 未加载

评论 #28089554 未加载

bmn__almost 4 years ago

UUID, serial or identity columns for PostgreSQL auto-generated primary keys? (2021-05-31) <a href="https://news.ycombinator.com/item?id=27345837" rel="nofollow">https://news.ycombinator.com/item?id=27345837</a>Sortable Collision-Free UUIDs (2021-05-03) <a href="https://news.ycombinator.com/item?id=27030088" rel="nofollow">https://news.ycombinator.com/item?id=27030088</a>GUIDs Are Not the Only Answer (2021-01-05) <a href="https://news.ycombinator.com/item?id=25650907" rel="nofollow">https://news.ycombinator.com/item?id=25650907</a>Understanding How UUIDs Are Generated (2020-09-30) <a href="https://news.ycombinator.com/item?id=24636204" rel="nofollow">https://news.ycombinator.com/item?id=24636204</a>

tialaramexalmost 4 years ago

After some staring, I can't see why (beyond the obvious, I understand HN rules) this is here.This is clearly an individual draft. OK. And it has previously expired without action (last year) but a newer draft was submitted this year.But it doesn't seem to have been adopted by any working group, and although the word "dispatch" appears in the title it was not discussed at IETF 111's DISPATCH or GENDISPATCH or SECDISPATCH last week. If this was in fact dispatched somewhere - perhaps before it expired last time - there's no indication where it went or what its status is now.It is the nature of these things that if everybody chooses to do exactly this, even if it was only documented in a by-then expired draft, or on the back of an envelope, then that's how it is -- the IETF has no enforcement arm. However if you support this proposal, or even if you think it'd be a good idea with minor tweaks you should find out where (and if) it's being developed and get on board with that.

评论 #28091728 未加载

politicianalmost 4 years ago

I'm really excited to see k-sortable unique identifiers (flakes) be submitted as an IETF draft. This will help keep the UUID data type relevant.However, I'd like to see the draft include a mention about the practice of embedding machine and data type identifiers into the format which helps in distributed applications.

评论 #28091490 未加载

评论 #28090008 未加载

ComputerGurualmost 4 years ago

See ulid for similar prior art. We generate Ulids in code and then store them in uuid columns in Postgres to realize the compact (binary) size benefits.<a href="https://github.com/ulid/spec" rel="nofollow">https://github.com/ulid/spec</a>

评论 #28090901 未加载

评论 #28090965 未加载

评论 #28089961 未加载

cratermoonalmost 4 years ago

I prefer ksuid <a href="https://github.com/segmentio/ksuid" rel="nofollow">https://github.com/segmentio/ksuid</a>

评论 #28089683 未加载

评论 #28089849 未加载

sudhirjalmost 4 years ago

For those interested in time based UUIDs, I've written libraries in Ruby and Go to move quickly between them:<a href="https://github.com/sudhirj/uulid.go" rel="nofollow">https://github.com/sudhirj/uulid.go</a> <a href="https://github.com/sudhirj/shortuuid.rb" rel="nofollow">https://github.com/sudhirj/shortuuid.rb</a> <a href="https://github.com/sudhirj/shortuuid.go" rel="nofollow">https://github.com/sudhirj/shortuuid.go</a>

评论 #28091362 未加载

fhrow4484almost 4 years ago

> implementations MAY dedicate a portion of the node's most significant random bits to a pseudo-random machineID which helps identify UUIDs created by a given node. This works to add an extra layer of collision avoidance.> This machine ID MUST be placed in the UUID proceeding [sic] the timestamp and sequence counter bits. This position is selected to ensure that the sorting by timestamp and clock sequence is still possible.This guarantees uniqueness at a global level, as long as each machine doesn't run out of sequence counters within a given timestamp.But why must that machine ID must preceding the timestamp & sequence counter? Why not have it after? (or does "proceeding" has a meaning I'm not aware of? I read it as a typo for "preceding", but I'd assume it should be succeeding, especially given what the next sentence says)My intuition is range requests based on timestamp would work better if the machine ID is after, not before... If before, it seems it would violate the key requirement in abstract of "sortable using the monotonic creation time".(It's already violated since each machine in distributed system doesn't have the same clock so "creation time" is all relative. But for purposes of analytics, such as querying the "last 24h", having the timestamp be at the beginning seems preferable, since range queries can be done easily)

评论 #28091285 未加载

评论 #28091242 未加载

评论 #28091224 未加载

评论 #28096249 未加载

Croftengeaalmost 4 years ago

To put it simply, the new standard aims to address UUID usage as primary keys in distributed systems.But UUIDv7 is described as: Unix timestamp + fractions of second + increasing sequence + random number. Now imagine two processes start simultaneously and start generating UUIDs right away. IMHO chances of the two processes generating exactly the same UUID sequence are pretty high unless the implementation is smart enough to feed something like process id as a seed to random function.

评论 #28090533 未加载

rifficalmost 4 years ago

Also see RFC 4122:<a href="https://datatracker.ietf.org/doc/html/rfc4122" rel="nofollow">https://datatracker.ietf.org/doc/html/rfc4122</a>

RcouF1uZ4gsCalmost 4 years ago

> 48-bit pseudo-random number used as a spatially unique identifier Occupies bits 80 through 127 (octets 10-15)Is 48 bits really enough to be a spatially unique identifier. Roughly 16 million entities would have a 50% chance of collision.If you have a spatially unique identifier collision, it seems it might be possible for two independent entities to generate the same time stamp and counter codes resulting in an overall UUID collision.

评论 #28090292 未加载

gopalvalmost 4 years ago

> The machineID MUST NOT be an IEEE 802 MAC address.> MAC addresses pose inherent security risks and MUST not be used for node generation.Interesting concern in the distributed generation pathway.I've used MAC addresses in the past for absolutely unique identifiers, but this is calling out that as a security risk, because the time + arp data might be known to predict a future UUID from a machine?

评论 #28090693 未加载

submetaalmost 4 years ago

OT: Nicely written and formated ascii document. I wonder what tools they used to create the headars / page numbers / refs.

评论 #28092760 未加载

评论 #28092750 未加载

评论 #28093330 未加载

jhealyalmost 4 years ago

The author seems to be developing the draft on GitHub and there's a few edits since the v01 version linked here<a href="https://github.com/uuid6/uuid6-ietf-draft" rel="nofollow">https://github.com/uuid6/uuid6-ietf-draft</a>

skrebbelalmost 4 years ago

Seems like a missed opportunity to sneak ULID right in (<a href="https://github.com/ulid/spec" rel="nofollow">https://github.com/ulid/spec</a>), given that it's already pretty widely used.

atonsealmost 4 years ago

TLDR they present 3 new versions that each have their own trade-offs, but are also taking into account being able to use these as DB primary keys, which is really great.Another thing I was quite curious about (looks like you can use them in existing DB columns etc):> The UUID length of 16 octets (128 bits) remains unchanged. The textual representation of a UUID consisting of 36 hexadecimal and dash characters in the format 8-4-4-4-12 remains unchanged for human readability. In addition the position of both the Version and Variant bits remain unchanged in the layout.

评论 #28089390 未加载

评论 #28089751 未加载

darkhorse13almost 4 years ago

A bit off-topic, but does anyone know how to build webpages that look like this? Like is there any way to do it other than doing everything manually on a text editor?

评论 #28091189 未加载

评论 #28090669 未加载

评论 #28090618 未加载

thaynealmost 4 years ago

For UUIDv7 it isn't very clear on how the subsecond part is encoded. From the description of decoding it sounds like it would be multiplying the fraction of a second by 2^n. but that isn't very explicit. And if you want to avoid floating point arithmetic you'll need to include a 10^p in there where p is how many digits of precision you have in base 10 (such as 3 for milliseconds)

infinityplus1almost 4 years ago

Does anyone have any opinion on how Firebase push keys compare to UUIDs? Firebase push keys are said to be unique and are sortable. Here's a link to the push key generator: <a href="https://gist.github.com/mikelehen/3596a30bd69384624c11" rel="nofollow">https://gist.github.com/mikelehen/3596a30bd69384624c11</a>

Lazarealmost 4 years ago

I like using standards, but the advantages of KSUIDs over existing UUIDs has led me to adopt them by default in new (and where possible, existing) projects.I was sort of hoping this would yield something obviously superior to KSUIDs, since nominally this is an attempt to create an IETF standard to solve the same problem KSUIDs solve, but......I'm not seeing it. I just want enough random bits to ensure I don't need to worry about a collission, combined with enough of a timestamp I can roughly k-sort them and get acceptable performance using them in a DB index. KSUIDs do this. These proposed formats...not so much.Specifically:1. All three of these standards devote significant numbers of bits to providing timestamps with very high precision or very distant epochs, or both. If you're generating a v6 UUID and your device's wall clock says it's currently the year 1582 (or really, any year prior to 2021), something has gone very wrong. As for accuracy, there's nothing wrong with it, but you could also shave off a few bits and spend that space on more randomness.2. All three of these standards devote shockingly few bits to containing raw random bits. v6 has 48, and v7 and v8 don't require any (!) and maxes out at 62 bits. If you could trust that all devices generating v6 UUIDs had super accurate, super in-sync wall clocks, I suppose 48 bits of randomness and a 100 nanosecond timestamp might make sense. But in the real world, I think you're better off focusing on randomness, with only enough of a timestamp to make your database happy.3. Both v6 and v7 devote significant attention to a clock mechanism. Clocks make a ton of sense in a system like Boundary's Flake (and Twitter's Snowflake), where you're embedding a node ID in the generated value, but these proposed formats don't do this. (Aside from an offhand commend that you could repurpose some of the random bits for this, but that this is "out of scope" for the spec.) Which is fine, but if your UUIDs aren't namespaced per-node, then your clock now needs to be synchronized globally, which is somewhere between "unfeasible" and "outright impossible". I appreciate the review they did of prior solutions, but you can't just grab arbitrary features from successful designs without considering the context that made the feature work in that design. As far as I'm concerned, if the format doesn't have a node ID, then all the bits devoted to clocks should be spent on more randomness.I assume/hope that these UUIDs are useful for some applications, but I'm really struggling to see it. They seem good at things that aren't going to benefit users and bad at the things they're trying to solve, particularly at scale, although there's no real point for hobbyists either.

bob1029almost 4 years ago

v7 sounds compelling for keying entities in our product. We use v4 right now.What I am trying to determine is if 62 bits of entropy combined with a timestamp gives me better or worse collision resistance as with 122 bits in a purely-random format. Most of our keys only live within the scope of a single computer, but ~5% of them might be generated/shared between other systems.Being able to order our keys by time would be really nice, and I like that they would compress/index better.Maybe I could do a hybrid between V4 and V7 keys depending on if the type would be shared with external parties. There are many types that only ever scope to a single box.I probably couldn't play with the idea of adding a "machine id" because the space of all possible machines is difficult to anticipate/control right now.I think I just talked myself into sticking with v4.

评论 #28091387 未加载

评论 #28091456 未加载

OliverJonesalmost 4 years ago

If the epoch in this proposal were changed to something closer to the present day (2000-01-01 or the UNIX epoch) the format could easily recover some bits from the time fields to put in the PRNG-generated "node" fields. I wonder why they chose the Gregorian epoch?

评论 #28092624 未加载

jjicealmost 4 years ago

I'm a bit confused as to what the use case for UUIDs are compare to incrementing integer IDs. I assume there are some big upsides (aside from the primary key issue which seems to have quite a few solutions at this point), but I'm just not aware of them.

评论 #28089963 未加载

评论 #28089921 未加载

评论 #28090260 未加载

评论 #28089902 未加载

评论 #28090581 未加载

评论 #28090313 未加载

评论 #28089918 未加载

评论 #28090225 未加载

评论 #28091616 未加载

mrgleecoalmost 4 years ago

OT but related: a reverse-sorted UUID format seems especially useful for IoT and event data where typically we want to read newest to oldest lexicographically (eg. rowscans starting at now). Is there such a standard or OSS that does this?

no_wizardalmost 4 years ago

I haven’t seen any uuid format in any project I’ve worked on get explicitly created except uuid V4.What’s the actual advantage of the other uuids? I know what they are on paper but has anyone used other ones in practice?

gfodyalmost 4 years ago

if you were about to use a uuid as a primary key in a database, wouldn't it always be better to instead use a composite key with explicit columns for sequence, timestamp, node id, etc.? if you really need to accept client side generated values and there's no opportunity to issue them a unique node id before hand, then explicitly taking their mac address and region seems better than stealthily relying on those things being embedded in a uuid - also aren't mac addresses considered PII?

kokizzu3almost 4 years ago

meanwhile.. i did make a shorter one :3 <a href="https://github.com/kokizzu/lexid" rel="nofollow">https://github.com/kokizzu/lexid</a>

Waterluvianalmost 4 years ago

I find it peculiar that the Introduction section on these drafts is always just boilerplate.

surfingdinoalmost 4 years ago

Interesting. I'd be curious how it affects sharding.

geostyxalmost 4 years ago

This looks awesome. Can't wait till I can add this to uuid.rocks!

pyuser583almost 4 years ago

Ok so we’re going back to timestamps.The wheel turns.