TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Analyzing the codebase of Caffeine, a high performance caching library

268 点作者 synthc3 个月前

11 条评论

jedberg3 个月前
It would be interesting to see this on reddit&#x27;s workload. The entire system was designed around the cache getting a 95%+ hit rate, because basically anything on front page of the top 1000 subreddits will get the overwhelming majority of traffic, so the cache is mostly filled with that.<p>In other words, this solves the problem of &quot;one hit wonders&quot; getting out of the cache quickly, but that basically already happened with the reddit workload.<p>The exception to that was Google, which would scrape old pages, and which is why we shunted them to their own infrastructure and didn&#x27;t cache their requests. Maybe with this algo, we wouldn&#x27;t have had to do that.
评论 #42909214 未加载
评论 #42909296 未加载
评论 #42910095 未加载
jbellis3 个月前
Caffeine is a gem. Does what it claims, no drama, no scope creep, just works. I&#x27;ve used it in anger multiple times, most notably in Apache Cassandra and DataStax Astra, where it handles massive workloads invisibly, just like you&#x27;d want.<p>Shoutout to author Ben Manes if he sees this -- thanks for the great work!
评论 #42909761 未加载
评论 #42910926 未加载
评论 #42912864 未加载
hinkley3 个月前
Years ago I encountered a caching system that I misremembered as being a plugin for nginx and thus was never able to track down again.<p>It had a clever caching algorithm that favored latency over bandwidth. It weighted hit count versus size, so that given limited space, it would rather keep two small records that had more hits than a large record, so that it could serve more records from cache overall.<p>For some workloads the payload size is relatively proportional to the cost of the request - for the system of record. But latency and request setup costs do tend to shift that a bit.<p>But the bigger problem with LRU is that some workloads eventually resemble table scans, and the moment the data set no longer fits into cache, performance falls off a very tall cliff. And not just for that query but now for all subsequent ones as it causes cache misses for everyone else by evicting large quantities of recently used records. So you need to count frequency not just recency.
评论 #42912156 未加载
评论 #42914212 未加载
thomastay3 个月前
&gt; However, diving into a new caching approach without a deep understanding of our current system seemed premature<p>Love love love this - I really enjoy reading articles where people analyze existing high performance systems instead of just going for the new and shiny thing
dan-robertson3 个月前
Near the beginning, the author writes:<p>&gt; Caching is all about maximizing the hit ratio<p>A thing I worry about a lot is discontinuities in cache behaviour (simple example: let’s say a client polls a list of entries, and downloads each entry from the list one at a time to see if it is different. Obviously this feels like a bit of a silly way for a client to behave. If you have a small lru cache (eg maybe it is partitioned such that partitions are small and all the requests from this client go to the same partition) then there is some threshold size where the client transitions from ~all requests hitting the cache to ~none hitting the cache.)<p>This is a bit different from some behaviours always being bad for cache (eg a search crawler fetches lots of entries once).<p>Am I wrong to worry about these kinds of ‘phase transitions’? Should the focus just be on optimising hit rate in the average case?
评论 #42911840 未加载
评论 #42911678 未加载
评论 #42911692 未加载
评论 #42911722 未加载
quotemstr3 个月前
Huh. Their segmented LRU setup is similar to the Linux kernel&#x27;s active and inactive lists for pages. Convergent evolution in action.
评论 #42910876 未加载
nighthawk4543 个月前
Seems to be hugged, so here&#x27;s a cached view<p><a href="https:&#x2F;&#x2F;archive.is&#x2F;w8yFG" rel="nofollow">https:&#x2F;&#x2F;archive.is&#x2F;w8yFG</a><p><a href="https:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20250202094451&#x2F;https:&#x2F;&#x2F;adriacabeza.github.io&#x2F;2024&#x2F;07&#x2F;12&#x2F;caffeine-cache.html" rel="nofollow">https:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20250202094451&#x2F;https:&#x2F;&#x2F;adriacabe...</a> (images are cached better here)
dstroot3 个月前
Codebase has &gt;16k stars on GitHub and only 1 open issue, and 3 open PRs. Never seen that before on a highly used codebase. Kudos to the maintainer(s).
评论 #42909824 未加载
评论 #42910585 未加载
评论 #42909973 未加载
评论 #42910752 未加载
jupiterroom3 个月前
really random question - but what is used to create the images in this blog post? I see this style quite often but never been able to track down what is used.
评论 #42909168 未加载
评论 #42908006 未加载
评论 #42907971 未加载
评论 #42912496 未加载
评论 #42908125 未加载
评论 #42907972 未加载
synthc3 个月前
Interesting deep dive on the internals of Caffeine, a widely used JVM caching library.
评论 #42907630 未加载
评论 #42908444 未加载
评论 #42907869 未加载
urbandw311er3 个月前
Caffeine is also the name of a macOS utility to stop the screen going to sleep. Be great if whichever came second could consider a name change.
评论 #42922679 未加载