TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

String.hashCode() is plenty unique

52 pointsby jxubalmost 7 years ago

4 comments

theclawalmost 7 years ago
There is no error in the original article other than how it’s phrased. I think it intends to warn people not to trust String.hashCode() to be unique, with a lot of examples.<p>That is good! Why criticise it? Proper use of hashCode() is for quickly comparing if strings might be equal before doing a full string compare, it’s meant for building hashtables.
评论 #17738931 未加载
评论 #17738951 未加载
评论 #17739938 未加载
Waterluvianalmost 7 years ago
Just my $0.02: there is nothing to be gained by nagging about someone&#x27;s punctuation and grammar. That doesn&#x27;t win you an argument about programming.
评论 #17738290 未加载
评论 #17738211 未加载
评论 #17738071 未加载
Someonealmost 7 years ago
<i>”resulted in 1 collision. A “fair” hash function would generate an expected 1.44 collisions over this data. String.hashCode() outperforms a fair hash function significantly”</i><p>I doubt that is significant, and you needn’t even lookup the confidence intervals. Think of it this way: if you ran one more experiment with similar data, if that perfectly good hash were to approximate that 1.44 collision on average between the two experiments, is has to have at least one experiment where it has zero or one collision.<p>Also, string hashing has a few requirements that I think are more important than having an optimal probability of collisions:<p>- it has to work well on typical 16-bit text strings, where most of the time, half the bytes are zero and most of the other bytes have only fivefold six bits that vary (that’s why there are so many collisions in two character strings: they are four bytes long, but, at best, differ in only about 10 bits)<p>- it has to be fast.
iainmerrickalmost 7 years ago
The original criticism seems perfectly valid (although maybe people are reading too much into it).<p>If you have a 32-bit hash and less than 32 bits of input, it’s reasonable to hope all hashes might be unique. 10x extra collisions on short strings does seems pretty bad. And it’s unfortunate that this hash function can’t be changed without breaking the language spec.
评论 #17739371 未加载