TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Overflow in consistent hashing (2018)

81 pointsby g0xA52A2Aabout 1 year ago

6 comments

mjbabout 1 year ago
This is a very cool page. I love little simulations like this for building intuition for systems problems.<p>Practical systems deal with this by not caring strongly about overflow (caches), by running at low enough utilizations that overflow is very unlikely given their item counts (e.g. Dynamo), by using explicit partitioning rather than consistent hashing (e.g. DynamoDB), by being able to take advantage of multi-tenancy to drive up per-physical-node utilization even in the case of low per-logical-node utilization, or by using some additional algorithmic sophistication (e.g. Chen et al <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;1908.08762" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;1908.08762</a>).<p>In practice, this kind of overflow is a big deal for systems that deal with relatively small numbers of large objects, and are not as big a deal for systems that deal with large numbers of small objects. Try out the numbers in the page&#x27;s &quot;Handy Calculator&quot; to see how that plays out.<p>It&#x27;s also worth mentioning that this isn&#x27;t unique to consistent hashing, but is a problem with random load balancing more generally. &quot;Pick a random server and send traffic to it&quot; is an OK load balancing strategy when requests are small and servers are large, but a terrible one when requests become relatively large or expensive. In the general load balancing&#x2F;placement problem this is easier than the storage case, because you don&#x27;t need to find requests again after dispatching them. That makes simple algorithms like best-of-2 and best-of-k applicable.
评论 #40416079 未加载
评论 #40421350 未加载
User23about 1 year ago
Anyone interested in consistent hashing should take a look at the simpler and more general rendezvous hashing[1] too.<p>[1] <a href="https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;Rendezvous_hashing" rel="nofollow">https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;Rendezvous_hashing</a>
评论 #40415728 未加载
评论 #40414816 未加载
pjdesnoabout 1 year ago
Actually, I think what a lot of real systems do is equivalent to pre-computing a &quot;random&quot; table that has suitable balancing properties, and then use the hash to index into it.<p>e.g. Ceph used to have a big problem with overloaded placement groups, causing some disks to get twice as much load; max throughput was when those maxed out, leaving the rest half-idle. I don&#x27;t recall the details of the current solution, but I think it&#x27;s equivalent to generating a random assignment, and then tweaking it to get rid of over-full bins.<p>The original Chord-style consistent hashing is easier in the P2P environments it was designed for, but typically consistent hashing is used today in much more closely-coupled systems.
lucianbrabout 1 year ago
Aren&#x27;t you supposed to use more buckets than nodes, with each node hosting a number of buckets not all adjacent on the circle? This I expect would reduce the problems described in the article, though not eliminate them of course.
评论 #40415927 未加载
kwilletsabout 1 year ago
I believe this problem diminishes as the key count goes up.
评论 #40416132 未加载
10000truthsabout 1 year ago
Load balancing solves the issue of non-uniform hashing by generating two hashes and picking the hash that corresponds to the node with lower load. Can something similar be done here?
评论 #40417098 未加载