We chose NanoIDs for PlanetScale’s API

79 点作者 s4i超过 2 年前

17 条评论

If you want pseudo-random counters for something that are guaranteed to not have collisions, consider using a "linear feedback shift register". LFSRs allow you to choose the number of bits in your id, and "complete" LFSRs use a starting seed [+] value that guarantee they will exhaust the entire bit space before repeating. They are very cool.In distributed environments, you can assign different seeds to individual nodes, include the seed in the id, and guarantee no collisions across your entire network.[+]: edit: sorry, I meant "taps". It's been a minute since I got to do something new with an LFSR. LFSR output is determined by "taps" & "seed" & algorithm. Galois LFSR is an easy algorithm to implement. There are publicly available references and datasets for full-cycle LFSR taps for different bit sizes.I have a php implementation at <a href="https://github.com/robsheldon/asinius-lfsr">https://github.com/robsheldon/asinius-lfsr</a>, but the code is absurdly simple and trivial to translate into any other language. Some references and further notes are included.

评论 #34179715 未加载

mnutt超过 2 年前

I wonder how this might compare to just storing regular autoincrementing ints in the database, and converting to/from hashids (<a href="https://hashids.org/" rel="nofollow">https://hashids.org/</a>) at the edge. It eliminates the collision concern and stores more compactly at the cost of a tiny amount of encode/decode when processing requests. You’d want to push it down as close to the database layer as possible to avoid inadvertent int ID leaks; I added native hashids support to clickhouse but I’m not sure what other database support might entail.

评论 #34178328 未加载

评论 #34177423 未加载

hangonhn超过 2 年前

Can someone clarify this statement from the original nanoID site (<a href="https://github.com/ai/nanoid">https://github.com/ai/nanoid</a>) for me? "random % alphabet is a popular mistake to make when coding an ID generator. The distribution will not be even; there will be a lower chance for some symbols to appear compared to others."If random is picked such that it's in the range of alphabet (i.e. 0 to 25), then the bias should not exist, right? Is that what he's alluding to? Thanks in advance.

评论 #34177626 未加载

评论 #34178479 未加载

ezekg超过 2 年前

> This gives us a 1% probability of a collision in the next ~35 years if we are generating 1,000 IDs per hour.Yes, but that's for 1,000 IDs per hour. That is not a large workload. If you generate 10 IDs per second (which, again, is not a lot), your time frame shrinks to just over 1 year. With 100 IDs per second, a mere 36 days. With your reduced alphabet, I would at the very least bump your ID length to 16 characters.

评论 #34179771 未加载

mdaniel超过 2 年前

the previous submission had good points: <a href="https://news.ycombinator.com/item?id=30856703" rel="nofollow">https://news.ycombinator.com/item?id=30856703</a>

评论 #34177788 未加载

skeeter2020超过 2 年前

"User friendly, clickable in the browser, api urls" seems like... a made-up user requirement? Have they nailed their product so well that this is the highest value work outstanding? There's some neat technical details in there but to me this is the definition of engineering procrastination.

评论 #34177042 未加载

评论 #34175646 未加载

评论 #34176466 未加载

评论 #34176959 未加载

ei8ths超过 2 年前

Couldn't you just remove the `-` dashes from the uuid and wouldn't that make it clickable? I dunno, I think I'll stick with using uuid's.

janus超过 2 年前

This is great, except for the additional query each time a record is instantiated in the rails concern. That might cause some performance problems in high traffic.Perhaps it’d be better to attempt the insertion and change the id only if there’s a colission detected with a uniqueness constraint

评论 #34177152 未加载

actinium226超过 2 年前

Kind of curious as to why they publish this. It's interesting, but if I did think I would just see it as an implementation detail and not something particularly worthy of an article. Should I maybe change my standards for what's publishable?

评论 #34178037 未加载

评论 #34179142 未加载

vlmutolo超过 2 年前

We should consider the following properties when evaluating ID formats and generation algorithms:1. Private: you shouldn’t be able to gain information about the system using the IDs from an ID alone. E.g. document enumeration attacks like what happened with Parler (<a href="https://www.wired.com/story/parler-hack-data-public-posts-images-video/" rel="nofollow">https://www.wired.com/story/parler-hack-data-public-posts-im...</a>)2. B-tree/cache friendly: newly created IDs should all exist in a narrow range of values. This is helpful for databases.3. Stateless: ideally you shouldn’t need to know the current state of the system to create a new ID.4. Human-friendly: IDs should be easily dictated, copied, pasted, etc. This means they should be encodable as text that is short and does not include ambiguous characters. Bonus points for error detection like with credit cards.Some of the these properties are in conflict. Statelessness is achieved by randomly generating long IDs, but people don’t like reading or typing long IDs.Different use cases will need these properties in varying amounts. If you don’t intend to expose the IDs to users, (4) doesn’t matter. Just use long, randomly generated byte strings prepended with the date. Most databases have a UUID type that fits the bill.If users are going to be working with IDs, that’s more complicated. If not every document has a user-facing ID, just go with the non-user-facing ID like before, and generate a shorter, random, stateful ID as needed.I don’t think NanoID prepends the date, which means it won’t be efficient when inserting large numbers of IDs into a large index. They also default to using ambiguous characters like 1 and I and l. Also no error code. But they are shorter than UUIDs. So it doesn’t meet property (2), and it only kind of meets property (4). NanoIDs are random, so you’re probably safe from enumeration attacks (1). NanoIDs mostly leave statelessness as a decision for the user. They have a nice tool that helps estimate how long the IDs should be (<a href="https://zelark.github.io/nano-id-cc/" rel="nofollow">https://zelark.github.io/nano-id-cc/</a>) for a given collision resistance.I think we can do better overall. Bitcoin uses a good encoding scheme called base58check (<a href="https://en.bitcoin.it/wiki/Base58Check_encoding" rel="nofollow">https://en.bitcoin.it/wiki/Base58Check_encoding</a>). It generates fairly short strings and uses a checksum at the end. I think it could be refined for non-bitcoin purposes, but it’s already pretty good.A 128-bit value like the ASCII string “hackernewstestid” is encoded as “Dtajqjz5pptWcmGrNcwBx7”. It’s about 2/3 the size of the equivalent UUID, even with the (unnecessarily long for this use case) checksum. It also has no punctuation.I’d like to see a small ID standard that meets the above requirements and has a choice for either stateless and long or stateful and short. Maybe another choice for secure random or insecure. But all options would have binary form and a text form. The text form would use something similar to base58check, but probably with a smaller (or user-determined) length for the checksum.

评论 #34177028 未加载

评论 #34184807 未加载

评论 #34177210 未加载

评论 #34178583 未加载

Jarwain超过 2 年前

I'm curious as to why they didn't decide on using UUIDs with the `-` stripped for URLs, and readding them for queries. It would accomplish the same goals wouldn't it?

评论 #34176456 未加载

joshmgross超过 2 年前

oh wow Mike is so smart

andrewstuart超过 2 年前

Just look out for this issue:<a href="https://github.com/ai/nanoid/issues/365">https://github.com/ai/nanoid/issues/365</a>

danbruc超过 2 年前

Don't expose your internal IDs, expose some identifier specifically made to be exposed.

评论 #34177533 未加载

评论 #34177663 未加载

survirtual超过 2 年前

Why not just do a SHA2/3 hash + b58 & truncate as desired? Seems pretty simple.For immutable data, that sort of identifier has an added benefit of deduping data.For mutable data, just take a hash of a cryptographically secure random function.

评论 #34178191 未加载

orasis超过 2 年前

How many bits is this?

mihaic超过 2 年前

After reading this article, my impression of Planetscale as a brand actually got worse. It seem to miss most of the essential bits of information:* The basic concept is that they just want to use a bigger alphabet to encode more information in fewer characters. The efficiency ratio of NanoID is log 36/log 16, or ~30% better since it has a bigger alphabet. You could get more if you went for instance with base 58 (includes uppercase, except I and O to remove ambiguity with the digits 1 and 0).* UUID can remove those dashes, that's just cosmetic. There are multiple UUID specs though, and some include a timestamp that actually might be useful for certain purposes.Overall the article seem to get lost in overly specific code snippets and explaining details without explaining essentials.

评论 #34183562 未加载