Sieve is simpler than LRU

287 点作者 SerCe超过 1 年前

30 条评论

kazinator超过 1 年前

Here is a flaw in the algorithm. The Hand pointer is essentially a roving pointer around a circular buffer. Elements that it passes over can live for another cycle (at least; more if they get visited).But there is a bias in the algorithm. Arbitrarly, the circle is broken into a linear queue, and the not present items which are accessed are arbitrarily inserted into the head of the queue (leftmost slot).This does not compute: a circular roving pointer sweeping through a queue? What?The problem is that items that are inserted at the head of the queue during a moment when the Hand is in a low position (close to the head, toward the left) are closer to being doomed than items added when the Hand is in the rightmost position. This is for no good reason other than just luck.How it should work is that new items should be added at a fixed distance from the Hand pointer. In other words, an Insert point moves around the circular buffer in lock step with the Hand. When the Hand obliterates an evicted item, items between that item and the Insert point will shuffle over, and the new item goes to the Insert point. Next time the Hand moves, the Insert point will move with it in the same direction.And now here is an elegant thing: we can make Insert and Hand be the same slot! When the Hand finds a victim we just replace the Hand slot with the new item, and mark it visited. Don't move any items around. On the next miss, the Hand will hop away from that item and mark it unvisited, and that's it. That item can then live until the next visit of the Hand, if it isn't touched again. This is just a FIFO with protection for the visited, essentially.The distance between Insert and Hand is a parameter; the algorithm can be measured at various values of that parameter.

评论 #38853449 未加载

评论 #38853402 未加载

评论 #38858082 未加载

评论 #38853226 未加载

评论 #38857876 未加载

评论 #38933827 未加载

layer8超过 1 年前

> SIEVE isn't just playing the part of a cache eviction algorithm; it's stepping up as a cache design superstar. Think of it like giving a fresh spin to classics. We've plugged SIEVE into LeCaR, TwoQ, ARC, and S3-FIFO, swapping out their LRU or FIFO queue for a SIEVE one.> Swapping LRU with SIEVE in these algorithms isn't just a minor tweak; it's more like giving them a turbo boost. Take ARC-SIEVE, for instance – it's turning heads with its slick efficiency, especially noticeable across different cache sizes. We didn't stop there. We pushed SIEVE a bit further by letting it peek into the future – well, sort of. We tested how well it could guess the next request. It turns out, with this extra bit of foresight, SIEVE is nailing it, outperforming the rest in almost all scenarios.I find that style of language off-putting and immature.

评论 #38853417 未加载

评论 #38859889 未加载

评论 #38855442 未加载

评论 #38856988 未加载

评论 #38853459 未加载

1a1a11a超过 1 年前

Disclaimer: this is the co-author.Sorry for the hyped language. If my guess is correct, the blog was "polished" by ChatGPT, and the intention was to "polish" the English. The paper has more content and should be read more seriously.We will fix the animation error and update the writing shortly. Thank you for the feedback!Regarding some of the complaints on SIEVE performance. Here are some more results that I ran for other purposes. <a href="https://observablehq.com/@1a1a11a/sieve-miss-ratio-plots" rel="nofollow">https://observablehq.com/@1a1a11a/sieve-miss-ratio-plots</a>I haven't had time to add other algorithms to the plot, but they are often similar to or worse than TinyLFU (I did not pick TinyLFU to compare deliberately; I happen to have the figures on hand). While the results do not include byte miss ratio, which is important in some use cases, they are better than the results already shown.We acknowledge that* SIEVE may not work well on some block I/O workloads (this has been the statement in our writing from the beginning). However, SIEVE works well on large multi-tenanted block I/O workloads (see the Tencent and Alibaba figure in the link). For web workloads, SIEVE shows consistently good results.* As the author, when I look at this algorithm. My own feeling is that SIEVE may not be robust because it keeps evicting new objects when the hand moves to the head, effectively behaving as MRU. However, our large evaluations showed that modern workloads have certain properties that enable SIEVE to achieve a low miss ratio. We are still investigating this property and how to model it. If someone is interested, feel free to shoot me an email.* implementing SIEVE with a log-structured design or circular buffer is not trivial, but we had a design called Segcache (<a href="https://segcache.com" rel="nofollow">https://segcache.com</a>), which runs a constrained version of SIEVE. In simple libraries, linked list implementation should be easy, and one can check the examples on the website.

评论 #38858353 未加载

评论 #38857922 未加载

评论 #38857767 未加载

vanderZwan超过 1 年前

My first impression was that this seems related to the S3-FIFO queue data structure that was posted here last year[0], and the personal site of that algorithm's main author actually mentions it[1]:> S3-FIFO and SIEVE are being evaluated at several companies, if you are interested in joining the adoption army or if you look for help and want to provide help, find my email on this page. :)Looking into the papers of both algorithms, they basically swapped primary and secondary authors[3][4], with the main author of Sieve being the one to write this blog post.Just sharing this to give some context that could be useful for people who actually know about these topics to assess the claims being made. I don't deal with the problems these data structures aim to solve at my day job, so I'll refrain from commenting on that.But I will say that from an implementation perspective I think the algorithms look elegant in their simplicity, and easy to grok, so at least that part of what they promise is held up.EDIT: Mark Brooker also wrote a blog post about it[4], for those who want to have read the perspective from a third party who has the proper background.[0] <a href="https://s3fifo.com/" rel="nofollow">https://s3fifo.com/</a>[1] <a href="https://junchengyang.com/" rel="nofollow">https://junchengyang.com/</a>[2] <a href="https://yazhuozhang.com/assets/pdf/nsdi24-sieve.pdf" rel="nofollow">https://yazhuozhang.com/assets/pdf/nsdi24-sieve.pdf</a>[3] <a href="https://dl.acm.org/doi/10.1145/3600006.3613147" rel="nofollow">https://dl.acm.org/doi/10.1145/3600006.3613147</a>[4] <a href="https://brooker.co.za/blog/2023/12/15/sieve.html" rel="nofollow">https://brooker.co.za/blog/2023/12/15/sieve.html</a>

评论 #38858933 未加载

blt超过 1 年前

SIEVE isn't just playing the part of a cache eviction algorithm; it's stepping up as a cache design superstar. Think of it like giving a fresh spin to classics.Is this how academics write about their own work in blog posts now?The result seems really strong. It's sad the author feels this kind of hype is necessary.

评论 #38851842 未加载

评论 #38851715 未加载

评论 #38852924 未加载

评论 #38852754 未加载

评论 #38852213 未加载

评论 #38854764 未加载

评论 #38853543 未加载

0xFEE1DEAD超过 1 年前

> SIEVE is nailing it, outperforming the rest in almost all scenarios.I'm curious why there's a lack of transparency regarding the scenarios where SIEVE did not surpass other algorithms.Another concern is the emphasis on average performance and P10/P90 metrics. When evaluating a caching algorithm metrics like P99x are often more insightful.The omission of these details coupled with the sensationalist tone of the article definitely triggers some red flags for me.

评论 #38854369 未加载

评论 #38854260 未加载

Sandworm5639超过 1 年前

Why call it "FIFO queue"(isn't it just stack?) when actually you need to remove elements from the middle?Also I think you could do away with storing "prev" for each node and make it like std::forward_list(with some additional work during eviction).> We pushed SIEVE a bit further by letting it peek into the future – well, sort of. We tested how well it could guess the next request. It turns out, with this extra bit of foresight, SIEVE is nailing it, outperforming the rest in almost all scenarios.No idea what they mean here.

评论 #38852230 未加载

评论 #38853477 未加载

taeric超过 1 年前

I don't necessarily understand why caching wouldn't naturally move to the same strategies as garbage collection? Generational collectors, in particular.I remember we had a CACHE for credentials in a big service at a previous gig, and we found that periodically we would get our cache wiped clean. Turned out, a lot of our clients were working with a third party system that would periodically log into each of the clients' accounts to check on details. To us, that was every credential getting hit in a near linear scan every hour or so. Which... was perfectly set to kill anyone that was using LRU caching.Outside of randomness to hopefully retain enough of the items in the cache that are likely to get hit again soon, it seemed natural to have a set of generations for items that have only been seen once, and then another generation for things that have been more active. Which, sounds exactly like a lot of garbage collection schemes to my naive ear.

评论 #38855784 未加载

评论 #38855977 未加载

评论 #38855391 未加载

e63f67dd-065b超过 1 年前

A lab mate of mine is actually submitting a paper on investigating why clock sometimes outperforms LRU very soon. The other benefit of clock is simplicity of implementation: you just have a pointer march around a ring buffer. I’m surprised that sieve is sometimes better, we might have to add a small section to the paper discussing it.

评论 #38857148 未加载

mgaunard超过 1 年前

LRU makes sense to me, so that is what I use.The fact that some algos outperform it on some distributions (which might or might not be relevant to the use cases I might face) isn't good enough to convince me to switch.

评论 #38852313 未加载

评论 #38853720 未加载

sph超过 1 年前

From the look of it, it seems like this really doesn't like frequently accessed items being close by, as they probably get evicted the first time there is a cache miss on the currently-pointed item (but anyway end up further down the array which would mitigate the problem the second time around)Which also means, this does not perform well at all with very high hit rates. A dumb LRU cache might perform better if you can fit most of your items in the cache.In any case, seems simple enough, but very unintuitive compared to a LRU algorithm which makes obvious sense. It is really hard on paper, without looking at actual numbers, to tell how this would be better than LRU.

评论 #38852410 未加载

furyofantares超过 1 年前

I understood it eventually but both the description and animation were confusing at first.I think I would have been a lot less confused if there weren't so much hype and buildup about how simple it was gonna be.Maybe it really is easier to implement than LRU. I'll give the benefit of the doubt on that.But conceptually it's way more complex. I understand the LRU concept immediately from the name alone.All the later hype was off-putting too, in a way that reduces trust. I simply do not believe the author is being candid due to the marketing speak. Maybe those parts of the post can just be ignored but it leaves a lingering uncertainty about even the technical points.

latch超过 1 年前

I feel like a simple improvement to LRU is to not promote on every fetch. This adds some frequency bias into the mix and reduces the need to synchronize between threads. Is this a known/used technique?

评论 #38852204 未加载

rix0r超过 1 年前

There seems to be a mistake in the animation.> On a cache miss, SIEVE checks the object pointed to by the hand. If the object has been visited, its visited bit is resetWhen the animation gets to the point where the I element is checked against the cache, the hand moves off of D without resetting its visited bit.

评论 #38854768 未加载

3np超过 1 年前

I may be missing some subtlety but it seems like this is computationally O(N) on insertion, since the eviction will have to visit and mark every node in the pathological case where all existing entries are already visited?Compared to typical LRU O(logN) or O(1).

评论 #38853271 未加载

评论 #38853083 未加载

jandrewrogers超过 1 年前

There are two important differences between academic and real-world cache replacement designs that should be kept in mind:First, most competent real-world designs are considerably more complex than the academic model whence they are derived. Support for or resistance to many common cache patterns can often be grafted onto simpler academic designs, making them considerably more robust and flexible than academic analysis may suggest. Consequently, benchmarking performance of an academic design with other academic designs is not all that indicative of how it would perform against a real-world implementation. In practice, thoroughly engineered real-world designs all have similar cache hit rate characteristics regardless of the algorithm starting point because it is relatively straightforward to take ideas from other algorithms and make something that works similarly for your algorithm, albeit a bit less elegantly.Second, one of the most important properties of cache replacement algorithms that is often overlooked in academic literature is real-world computational cost and operation latency. It is not unusual for a disk cache to be accessed 10s of millions of times per second. This cost can vary by orders of magnitude across algorithm implementations that have similar cache hit rates. Consequently, you often go with a design that has excellent memory locality, efficient concurrency, minimizes CPU cache thrashing etc because that is what keeps cache replacement out of the profiler. Given the first point above, there is a bit of a convergent evolution in real-world cache replacement designs because this has a larger engineering impact than the specific algorithm you use. Heavy optimization of the CPU throughput characteristics severely restricts the practical design space, which tends to push you toward the same couple algorithm starting points. It is not intuitive to most software engineers that this is often the property you most optimize for, not things like cache hit rate, when they first get into the space.It is a fun design problem with a very high dimensionality tradeoff space.

评论 #38858451 未加载

MrYellowP超过 1 年前

> show that SIEVE outperforms all state-of-the-art eviction algorithms on more than 45% of the traces.Doesn't that mean that over half the time, 55%, it's sometimes equal and, more likely so, worse?

评论 #38854159 未加载

jstanley超过 1 年前

My favourite simple "eviction" algorithm is just to wipe out the entire cache as soon as it gets full.Insertion:<pre><code> %hash = () if keys %hash > 100000; $hash{$key} = $val; </code></pre> Retrieval:<pre><code> return $hash{$key}; </code></pre> Desirable properties include bounded memory usage (assuming bounded size of $val), O(1) (amortised) insertion/retrieval, and (most importantly) it only takes a second to write and there's almost 0 chance of writing it wrong.The downside is that the mean utilisation is only 50%.

评论 #38854579 未加载

评论 #38855523 未加载

评论 #38914599 未加载

评论 #38853235 未加载

jimberlage超过 1 年前

This is very interesting! I really like the emphasis on simplicity in addition to performance - seems easy enough to reason about. I guess for observability’s sake, when implementing this, you’d probably want to record a distribution of distance from the pointer to the evicted item, and match it up with request latencies.

1a1a11a超过 1 年前

In case someone is interested, there is an independent review and implementation at <a href="https://github.com/scalalang2/golang-fifo">https://github.com/scalalang2/golang-fifo</a>

评论 #38861109 未加载

nsajko超过 1 年前

Context: <a href="https://en.wikipedia.org/wiki/Cache_replacement_policies" rel="nofollow">https://en.wikipedia.org/wiki/Cache_replacement_policies</a>

kazinator超过 1 年前

There is obviously a LRU-like element in the algorithm because1. Evacuation generally shifts items to the right.2. The Hand moves to the left.3. Items skipped by the Hand (appearing to its right) are spared from eviction until the Hand wraps around.You can see it in the animation. D is flipped to visited just in the nick of time, and the Hand moves over it. Then it is spared from then on until the Hand wraps around (the animation doesn't go that far). Items that are used frequently will tend to be in the Visited state when the Hand passes over them, and stay cached.

评论 #38851793 未加载

tommiegannert超过 1 年前

Replaced the linked list with a Python list, for clarity: <a href="https://gist.github.com/tommie/577b4e154731a280136295180211e953" rel="nofollow">https://gist.github.com/tommie/577b4e154731a280136295180211e...</a> (and I hope I got it right)Since this is using a 1-bit visited counter, a small cache could use a bit vector and bit shifting to "remove" a value from the queue.

评论 #38851888 未加载

评论 #38852043 未加载

chmike超过 1 年前

The effectiveness really depends on the request pattern. In some use case, the LRU is the most efficient. When it is purely random, it obviously won't be the best as any value has the same probability to be selected.Regarding simplicity, there is a simpler algorithm where we don't need the next and prev pointers. We simply replace the non-flagged value at the hand. This is most probably what is used in CPUs.

charles_f超过 1 年前

What does the acronym SIEVE mean? I guess SImple EViction something? I couldn't find that in the paper

sebnukem2超过 1 年前

This is presented like the best cache algorithm ever invented, but it's not simpler than LRU.

Dwedit超过 1 年前

I always heard of this algorithm under the name "Not Recently Used" cache.

Hrun0超过 1 年前

It's a bit off topic, but how are the animations made?

评论 #38861269 未加载

the_common_man超过 1 年前

Is there a term for such hyped up language?

评论 #38854028 未加载

hknmtt超过 1 年前

>On a cache hit, SIEVE marks the object as visited. On a cache miss, SIEVE checks the object pointed to by the hand. If the object has been visited, its visited bit is reset, and the hand moves to the next position, keeping the retained object in its original position in the queue. This continues until an unvisited object is found and evicted. After eviction, the hand moves to the next position.Dude, what?

评论 #38851795 未加载

评论 #38851816 未加载

评论 #38851835 未加载

评论 #38851735 未加载

30 条评论

kazinator超过 1 年前

评论 #38853449 未加载

评论 #38853402 未加载

评论 #38858082 未加载

评论 #38853226 未加载

评论 #38857876 未加载

评论 #38933827 未加载

layer8超过 1 年前

评论 #38853417 未加载

评论 #38859889 未加载

评论 #38855442 未加载

评论 #38856988 未加载

评论 #38853459 未加载

1a1a11a超过 1 年前

评论 #38858353 未加载

评论 #38857922 未加载

评论 #38857767 未加载

vanderZwan超过 1 年前

评论 #38858933 未加载

blt超过 1 年前

评论 #38851842 未加载

评论 #38851715 未加载

评论 #38852924 未加载

评论 #38852754 未加载

评论 #38852213 未加载

评论 #38854764 未加载

评论 #38853543 未加载

0xFEE1DEAD超过 1 年前

评论 #38854369 未加载

评论 #38854260 未加载

Sandworm5639超过 1 年前

评论 #38852230 未加载

评论 #38853477 未加载

taeric超过 1 年前

评论 #38855784 未加载

评论 #38855977 未加载

评论 #38855391 未加载

e63f67dd-065b超过 1 年前

评论 #38857148 未加载

mgaunard超过 1 年前

评论 #38852313 未加载

评论 #38853720 未加载

sph超过 1 年前

评论 #38852410 未加载

furyofantares超过 1 年前

latch超过 1 年前

评论 #38852204 未加载

rix0r超过 1 年前

评论 #38854768 未加载

3np超过 1 年前

评论 #38853271 未加载

评论 #38853083 未加载

jandrewrogers超过 1 年前

评论 #38858451 未加载

MrYellowP超过 1 年前

评论 #38854159 未加载

jstanley超过 1 年前

评论 #38854579 未加载

评论 #38855523 未加载

评论 #38914599 未加载

评论 #38853235 未加载

jimberlage超过 1 年前

1a1a11a超过 1 年前

In case someone is interested, there is an independent review and implementation at <a href="https://github.com/scalalang2/golang-fifo">https://github.com/scalalang2/golang-fifo</a>

评论 #38861109 未加载

nsajko超过 1 年前

Context: <a href="https://en.wikipedia.org/wiki/Cache_replacement_policies" rel="nofollow">https://en.wikipedia.org/wiki/Cache_replacement_policies</a>

kazinator超过 1 年前

评论 #38851793 未加载

tommiegannert超过 1 年前

评论 #38851888 未加载

评论 #38852043 未加载

chmike超过 1 年前

charles_f超过 1 年前

What does the acronym SIEVE mean? I guess SImple EViction something? I couldn't find that in the paper

sebnukem2超过 1 年前

This is presented like the best cache algorithm ever invented, but it's not simpler than LRU.