TechEcho

10 comments

mpegabout 1 year ago

I recently wrote a version of this that I use in my projects, some things I do differently that you may or may not care about:- from your code it seems you're not sorting kwargs, I would strongly recommend sorting them so that whether you call f(a=1, b=2) or f(b=2, a=1) the cache key is the same- I use inspect.signature to convert all args to kwargs, this way it doesn't matter how a function gets called, the cache logic is always consistent. I know this is relatively slow but it only gets called once per function (I call it outside the wrapper) and the DX benefits are nice (in this same note, you could probably move the inspect.getsource call outside your wrapper fn for a speed boost)I also took the opposite approach to ignore_params, and made the __dict__ params that get hashed opt-in, which works well when caching instance methods

评论 #39757613 未加载

评论 #39757611 未加载

评论 #39755870 未加载

rmholtabout 1 year ago

I have extensively used <a href="https://pypi.org/project/diskcache/" rel="nofollow">https://pypi.org/project/diskcache/</a>. Is there a reason you decided to make an in house solution?

评论 #39757412 未加载

评论 #39756463 未加载

评论 #39755712 未加载

评论 #39754986 未加载

评论 #39755164 未加载

wildermuthnabout 1 year ago

If you aren’t caching LLM functions during development, then you’re an even greater glutton for punishment than the normal engineer.My local file cache Python decorator also allows the decorator to define the hash manually, either by the decorator’s parameter function call that plucks a value from the cached function params, or by calling a global function from anywhere with any arbitrary value.What’s cool about caching results locally to files during development is the ease of invalidating caches — just delete the file named after the function and key you want.

评论 #39757602 未加载

评论 #39757455 未加载

评论 #39755747 未加载

评论 #39756344 未加载

rthnbgrredfabout 1 year ago

Recently, I experimented with various techniques to cache some JSON responses from FastAPI, using Python decorators for both in-memory and disk caching on a single machine. After benchmarking the performance, I found the results somewhat disappointing (500 req/s vs 5k req/s). While caching did lead to a tenfold improvement in speed compared to no caching, I believe the primary bottleneck was Python's inherent performance limitations, which made it X times slower than a comparable program written in C. Consequently, I decided to remove the cache decorator and instead put a simple nginx caching reverse proxy in front of FastAPI. This resulted in performance gains that were an order of magnitude better (60k req/s) than those achieved with Python based caching.

评论 #39761968 未加载

andrewgazelkaabout 1 year ago

This looks cool :). A while ago, I wrote something similar that analyzes bytecode and invalidates the cache if the bytecode changes.<a href="https://github.com/andrewgazelka/smart-cache">https://github.com/andrewgazelka/smart-cache</a>

rassibassiabout 1 year ago

What's the difference to using joblibs Memory class similar to this implementation:<a href="https://github.com/stanfordnlp/dspy/blob/main/dsp/modules/cache_utils.py">https://github.com/stanfordnlp/dspy/blob/main/dsp/modules/ca...</a>

评论 #39759836 未加载

emilehereabout 1 year ago

Reminds me of a little prototype I wrote a while ago that tried to do something similar with Javascript's Proxy class. <a href="https://github.com/emileindik/cashola">https://github.com/emileindik/cashola</a>The main difference is that it stores the state of an object, not a function.If your data is JSON serializable then it could be a cool way to save and resume application state.

skp1995about 1 year ago

This is a pretty good implementation. I like the simplicity of it, reminds me of SQLite backed storage decorators we used to have, where the data was persisted to a DB instead of the file system (altho thats just a different storage engine)Does this also take care of the thundering heard problem? That was one of the cases where lru_cache really blows

评论 #39750550 未加载

eprabout 1 year ago

<pre><code> def hash_code(code): return hashlib.md5(code.encode()).hexdigest() </code></pre> Be warned. The above function is used as part of the hash. The ostensible purpose is to prevent using cached values of functions who's code has changed, but it does not handle dependencies of that function.

评论 #39756492 未加载

评论 #39758341 未加载

kapilsinhaabout 1 year ago

I like the simplicity. I definitely get the payoff for standalone Python scripts, where once the script errors out the memory is cleared. But do you see a similar payoff for Jupyter notebooks (or similar)?

评论 #39750491 未加载

10 comments

mpegabout 1 year ago

评论 #39757613 未加载

评论 #39757611 未加载

评论 #39755870 未加载

rmholtabout 1 year ago

I have extensively used <a href="https://pypi.org/project/diskcache/" rel="nofollow">https://pypi.org/project/diskcache/</a>. Is there a reason you decided to make an in house solution?

Show HN: File-based cache for slow Python functions

10 comments

Show HN: File-based cache for slow Python functions

10 comments