GUIDs Are Not the Only Answer

86 pointsby ublazeover 4 years ago

16 comments

I would strongly say strings are actually the _only_ type you should use for IDs. Prevents the vast majority of buggy client behaviour and gives you good flexibility to change how you do things over time.---My company ended up with a simple KSUID implementation of our own - <a href="https://www.cuvva.com/product-updates/showing-off-our-fancy-new-k-sortable-ids" rel="nofollow">https://www.cuvva.com/product-updates/showing-off-our-fancy-...</a> (having originally used UUIDs and Mongo ObjectIDs)For us, a big part of it was usability with cursor selection etc - in addition to it being immediately obvious what the ID was for.Once we finally had that rolled out everywhere, we ended up collecting up every other ID we'd ever used and mapped it to its KSUID resource equivalent, so now all our IDs work standalone without type/context info, even across environments (and thankfully we'd never had any collisions on the old IDs)---Going back to the typing - the most difficult part of migrating our IDs actually was converting them all to string types. With Postgres this is a little slow but ultimately fine, but with Mongo you have to actually remove and reinsert every document - you cannot (or at least could not) update IDs in place.

评论 #25655650 未加载

评论 #25652575 未加载

vsaretoover 4 years ago

>Poorly formatted log statements/errors can become harder to debug >UUIDs often need context to aid debugging.The GUID vs Int problem is incidental to the real problem: poorly formatting logs. An integer or other key type without context is no more helpful.>At the very least, identifiers should not be allowed to float freely as strings or integers in order to prevent a class of inconsistency bugs.I'm not familiar with the ergonomics of GUIDs across all languages, but C#/MSSQL makes them pretty easy to handle when they have been chosen as keys.So the answer, as far as ergonomics go, is not settled depending on your stack.

评论 #25655393 未加载

ivan_ahover 4 years ago

The Python package `shortuuid` makes working with UUIDs a little easier by encoding them as strings: <a href="https://github.com/skorokithakis/shortuuid#usage" rel="nofollow">https://github.com/skorokithakis/shortuuid#usage</a> (uses base-57 encoding, with alphabet consisting of A-Za-z0-9 with potentially confusing symbols skipped)The string representation is what you show to users, but under the hood it's still a UUID and compatible/interoperable with any other system that needs UUID-shaped identifiers.The coolest part is youcan even truncate the string encodings to get shorter IDs, which correspond to UUIDs with lots of leading zeros.

评论 #25655520 未加载

withinboredomover 4 years ago

GUID v6 is pretty nice when you need monotonously increasing numbers that are globally unique.On another note, I worked somewhere that prefixed GUIDs with the environment the app was running on. All of production, staging, even dev machines all used the same connection string.There was even a stored procedure to copy user accounts, etc of prod for your machine. It was hands-down the best debugging experience when a customer had an issue.

评论 #25652451 未加载

评论 #25666528 未加载

ivanbover 4 years ago

> Sometimes, we want a zero inconsistency approach to storing objects, so it might make sense to make the identifier (or part of it) the checksum of the content that is to be stored. This guarantees that the underlying content has not been modified.No, it does not. I come across this false statement again and again. It seems that a lot of developers do not understand what checksums or hashes guarantee and what they absolutely do not guarantee.Let's set this once and for all:1. Differing checksums or hashes guarantee that the content is different. 2. Identical checksums or hashes do not guarantee anything. The content could be identical or not.

评论 #25670212 未加载

bfungover 4 years ago

The main point of the article is that even using a UUID for database objects, when other functions in the backend code query multiple classes of objects, those function signatures are poorly written, ex:<pre><code> ban(id: uuid): pass </code></pre> There's no way for the reader to know what object the ID is referring to, and also the wrong object ID can be used in the function. Their solution is to wrap the ID in a type, which is correct, but did not continue writing that the function signatures in the backend code should be updated as well. Something like this should close the loop in the blog:<pre><code> ban(id: UserId): pass </code></pre> Where the compiler can now check that a Team.id is not being passed in the method, only a User.id can be passed into the method.

hermanradtkeover 4 years ago

I have stopped using UUID and GUID in favor of <a href="https://github.com/ulid/spec" rel="nofollow">https://github.com/ulid/spec</a>

评论 #25661414 未加载

winridover 4 years ago

This is one thing I like about type systems where you can declare a primitive type as MyImportantThing. This ensures the string or what have you is explicitly defined as MyImportantThing. Rust does this pretty well. C/C++ AFAIK will let you pass in the raw string, and so will Java if you "extend String".

评论 #25665994 未加载

评论 #25653027 未加载

yrimaxiover 4 years ago

Pretty obvious stuff. Of course GUIDs are more unweildy to read etc. compared to simple auto-incrementing integers.I don’t see a reason to prefix an id with something like `task-`. I would rather leave it to the display logic.

mamcxover 4 years ago

I tried many things for making sync work across devices. I tried GUIDs, and partitioning ranges of ints, and several versions of it.But what worked amazing?Use NATURAL keys (or their hash) + version field. That is all you need most of the cases. It make sync far easier, easier to trace stuff (thanks to version), immune to problems of timestamps (some computers have their cloks wrong). In short:<pre><code> struct Order { code: String, //natural key version:usize } struct Location { code: Hash //hash of city + country city:String, country:String, version:usize } </code></pre> Natural keys are global if well defined. In some places where it is not obvious, hashing the whole row and put a nice encode is the same.This also will reveal when something TRULY need a guid or similar. For example, for invoices in my country the law demand partition of ranges with certain characteristics (ie: INV-1-XXX in machine 1, INV-2-XXX in machine 2).Add another id:i64 become redundant most of the time. If your Order.code is duplicated or whatever it will be the same problem with or without an extra id:i64, so is better to deal with the problems of the ACTUAL data when is need and not mask it with other stuff.The downside is that the key become repeated in JOINS (like in InvoiceLine) but honestly all rdbms handle triggers, and it actually become very nice to see the Order.code in the child relations (far easier to correlate).

评论 #25653703 未加载

bob1029over 4 years ago

The best answer is the humble integer. The only reasonable arguments I have ever seen against using integer keys universally are as follows:#1 Integer keys have finite range.#2 Integer keys betray the identity of other sensitive resources when exposed as a public identity.#3 Integer keys are "difficult" to sequence in the face of multiple networked participants.My resolutions and counter-arguments are as follows:For many systems, #1 is not a concern, because the number of expected entities is well-bounded by a 64 bit integer. For others, #1 can be resolved by usage of more complex types such as BigInteger (C#). If utilized carefully, these can be treated just like normal integers, and quickly converted to/from byte arrays of appropriate length to satisfy the required range. In virtually all SQL implementations, blob columns containing these values can be indexed with the exact same semantics as with a 64-bit integer column. Whether this performs better or worse than GUID keys probably depends on if you can provoke a >120 bit BigInteger representation. This is quite unlikely, even for Google.#2 is trivially solved by simply applying encryption to sensitive keys as they traverse the boundary between your system and the outside world. AES256 would do the trick here. You could also generate entirely separate keys of any appropriate type for public consumption (i.e. maybe some YT-style identifier format).#3 is solved by anticipating the maximum possible # of nodes in your system, and then producing a key space in which identities are sharded out by a simple constant factor of that max quantity. This would certainly produce concern regarding all of the skipped identities (assuming you start with a small number of hosts on day 1), but the proposed resolution above for #1 (BigInteger) alleviates these concerns with a practically infinite range of keys. Skipping 10k identities is a non-event when you have all of infinity to pull from.There are also other considerations with this. GUID keys are a pain to communicate. Integers, even of massive range, are easy for most humans to communicate verbally when appropriate digit grouping and other reasonable measures are undertaken.Also consider a situation in which you decide to use 1 global integer range to key every single entity in your system. This allows for interesting database structures in which foreign keys are all referring to the same keyspace, so the specific type of a thing is no longer a hard constraint in a relational sense. Some would probably take substantial offense to this proposal, but I have found in many cases this allows for powerful optimizations. Anything can be used irresponsibly.

评论 #25652992 未加载

评论 #25655480 未加载

freeone3000over 4 years ago

Absolutely everything in this article is true, and I wish the folks who wrote stuff on Azure would read this... But if you actually need an opaque identifier for an arbitrary resource, what actually works better?

评论 #25665878 未加载

gorgoilerover 4 years ago

I’ve always wondered: what is the history behind hyphens in UUIDs?

评论 #25652680 未加载

ff333tteeover 4 years ago

I also tried to use custom types for IDs, but in my opinion, it has more cons than pros. I have to write custom serializers and model binders for them, explicitly convert them to other types, write separate validators etc... At the end I finished with even more bugs than before.

评论 #25654813 未加载

gnulinuxover 4 years ago

Why not just `f"task-{uuid.uuid4()}"`?

评论 #25655323 未加载

评论 #25652661 未加载

nesarkvechnepover 4 years ago

"At the very least, identifiers should not be allowed to float freely as strings or integers in order to prevent a class of inconsistency bugs."Tell that to almost every Typescript developer who uses `number` for identifiers.

评论 #25652404 未加载

评论 #25652419 未加载