TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: File-based cache for slow Python functions

73 点作者 williamzeng0大约 1 年前

10 条评论

mpeg大约 1 年前
I recently wrote a version of this that I use in my projects, some things I do differently that you may or may not care about:<p>- from your code it seems you&#x27;re not sorting kwargs, I would strongly recommend sorting them so that whether you call f(a=1, b=2) or f(b=2, a=1) the cache key is the same<p>- I use inspect.signature to convert all args to kwargs, this way it doesn&#x27;t matter how a function gets called, the cache logic is always consistent. I know this is relatively slow but it only gets called once per function (I call it outside the wrapper) and the DX benefits are nice (in this same note, you could probably move the inspect.getsource call outside your wrapper fn for a speed boost)<p>I also took the opposite approach to ignore_params, and made the __dict__ params that get hashed opt-in, which works well when caching instance methods
评论 #39757613 未加载
评论 #39757611 未加载
评论 #39755870 未加载
rmholt大约 1 年前
I have extensively used <a href="https:&#x2F;&#x2F;pypi.org&#x2F;project&#x2F;diskcache&#x2F;" rel="nofollow">https:&#x2F;&#x2F;pypi.org&#x2F;project&#x2F;diskcache&#x2F;</a>. Is there a reason you decided to make an in house solution?
评论 #39757412 未加载
评论 #39756463 未加载
评论 #39755712 未加载
评论 #39754986 未加载
评论 #39755164 未加载
wildermuthn大约 1 年前
If you aren’t caching LLM functions during development, then you’re an even greater glutton for punishment than the normal engineer.<p>My local file cache Python decorator also allows the decorator to define the hash manually, either by the decorator’s parameter function call that plucks a value from the cached function params, or by calling a global function from anywhere with any arbitrary value.<p>What’s cool about caching results locally to files during development is the ease of invalidating caches — just delete the file named after the function and key you want.
评论 #39757602 未加载
评论 #39757455 未加载
评论 #39755747 未加载
评论 #39756344 未加载
rthnbgrredf大约 1 年前
Recently, I experimented with various techniques to cache some JSON responses from FastAPI, using Python decorators for both in-memory and disk caching on a single machine. After benchmarking the performance, I found the results somewhat disappointing (500 req&#x2F;s vs 5k req&#x2F;s). While caching did lead to a tenfold improvement in speed compared to no caching, I believe the primary bottleneck was Python&#x27;s inherent performance limitations, which made it X times slower than a comparable program written in C. Consequently, I decided to remove the cache decorator and instead put a simple nginx caching reverse proxy in front of FastAPI. This resulted in performance gains that were an order of magnitude better (60k req&#x2F;s) than those achieved with Python based caching.
评论 #39761968 未加载
andrewgazelka大约 1 年前
This looks cool :). A while ago, I wrote something similar that analyzes bytecode and invalidates the cache if the bytecode changes.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;andrewgazelka&#x2F;smart-cache">https:&#x2F;&#x2F;github.com&#x2F;andrewgazelka&#x2F;smart-cache</a>
rassibassi大约 1 年前
What&#x27;s the difference to using joblibs Memory class similar to this implementation:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;stanfordnlp&#x2F;dspy&#x2F;blob&#x2F;main&#x2F;dsp&#x2F;modules&#x2F;cache_utils.py">https:&#x2F;&#x2F;github.com&#x2F;stanfordnlp&#x2F;dspy&#x2F;blob&#x2F;main&#x2F;dsp&#x2F;modules&#x2F;ca...</a>
评论 #39759836 未加载
emilehere大约 1 年前
Reminds me of a little prototype I wrote a while ago that tried to do something similar with Javascript&#x27;s Proxy class. <a href="https:&#x2F;&#x2F;github.com&#x2F;emileindik&#x2F;cashola">https:&#x2F;&#x2F;github.com&#x2F;emileindik&#x2F;cashola</a><p>The main difference is that it stores the state of an object, not a function.<p>If your data is JSON serializable then it could be a cool way to save and resume application state.
skp1995大约 1 年前
This is a pretty good implementation. I like the simplicity of it, reminds me of SQLite backed storage decorators we used to have, where the data was persisted to a DB instead of the file system (altho thats just a different storage engine)<p>Does this also take care of the thundering heard problem? That was one of the cases where lru_cache really blows
评论 #39750550 未加载
epr大约 1 年前
<p><pre><code> def hash_code(code): return hashlib.md5(code.encode()).hexdigest() </code></pre> Be warned. The above function is used as part of the hash. The ostensible purpose is to prevent using cached values of functions who&#x27;s code has changed, but it does not handle dependencies of that function.
评论 #39756492 未加载
评论 #39758341 未加载
kapilsinha大约 1 年前
I like the simplicity. I definitely get the payoff for standalone Python scripts, where once the script errors out the memory is cleared. But do you see a similar payoff for Jupyter notebooks (or similar)?
评论 #39750491 未加载