TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: How do orgs detect conflicting hashes in sub-second timing?

3 点作者 adam_ellsworth将近 3 年前
Today I was considering how organizations such as bitly, imgur, reddit, and even github handle hash collision prevention.<p>While for GH it (seems) easier as commit hashes are sufficiently long&#x2F;unique, for others with character hashes that only span ~6 chars or so, there&#x27;s bound to be instances where hashes conflict from a statistical standpoint. (iirc reading an article on a hash collision in GH a few years ago here on ycomb)<p>To my mind these orgs have to have a suite tools&#x2F;algos requesting information from multiple services, checking whether or not a hash has been taken – and those processes have to optimize for time. (e.g. when a user makes a post, what&#x27;s a reasonable time to do a lookup?)<p>So, what are the considerations which need to be made algorithmically to check such collisions while keeping runtime to an acceptable minimum?

5 条评论

verdverm将近 3 年前
Database indexes most likely. Consider GitHub...<p>1. Hashes are determined outside of their context, by git.<p>2. They store this is a database<p>3. Conflicts are so rare that they can be ignored. They must happen within the same repository. If they are in different repos, then it is not a conflict.
frogger8将近 3 年前
Not sure how these orgs do it but the system design book by Alex Xu uses timestamp + dataCenterID + machineId + sequence number that is reset every millisecond. This integer is then converted to base62.
manx将近 3 年前
If you have different workers to generate ids, give all of them their own ranges in advance. Once a worker used up their range, they can request a new range for their next N (e.g. 100k) ids.
compressedgas将近 3 年前
Do you mean &quot;how do they do unique identifier generation&quot;?<p>Most use a combination of precoordination and a database that when an insert fails they just generate new and try again.
ev1将近 3 年前
what parts specifically of those sites? most are not hashes.