TechEcho

5 comments

justin_vanwover 13 years ago

Or you can use a database where join performance isn't pathological, and scale out by adding read slaves.Caching is hard. Caching as an automated 'layer' is much harder. If it were possible to cache in a general way databases would do it already. Adding a 'caching layer' is opening a gate into hell. The things is, at first it will be fine. Thousands of engineer hours and hundreds of subtle bugs later, you'll (if you're wise) realize that opening that door let out many demons, you just didn't have the eyes to see them at the time."There are only two hard things in Computer Science: cache invalidation and naming things." -- Phil Karlton

评论 #3326751 未加载

评论 #3327149 未加载

评论 #3328786 未加载

fdintinoover 13 years ago

The timing of this post is funny, as I just got finished reworking our fork of django-cache-machine. As the post points out, the limitation of Cache Machine as it is currently built is that only objects which are already within a queryset can invalidate that queryset. This is fine for selects on primary keys, but beyond that the invalidation logic is incomplete.My changes ( <a href="https://github.com/theatlantic/django-cache-machine/" rel="nofollow">https://github.com/theatlantic/django-cache-machine/</a> ) inspect the ORDER BY clauses (if there's a limit or offset) and WHERE constraints in the query and saves the list of "model-invaliding columns" (i.e., columns which, when changed, should broadly invalidate queries on the model) in a key in the cache. It also associates these queries with a model-flush list. Then, in a pre_save signal, it checks whether any of those columns have changed and marks the instance to invalidate the associated model flush list when it gets passed into the post_save signal. We have these changes up on a few of our sites and, if all goes well, we're looking to move the invalidation at the column level to make the cache-hit ratio even higher.

评论 #3327157 未加载

zzzeekover 13 years ago

Took a look at the Django Cache Machine they mention, at the invalidation scheme (<a href="http://jbalogh.me/projects/cache-machine/#cache-manager" rel="nofollow">http://jbalogh.me/projects/cache-machine/#cache-manager</a>). It stores a "flush list", linking objects to their originating, cached queries. Interesting, though looks like something that could get out of hand quickly - are they storing the "flush list" itself in the cache (else how do all nodes learn of the invalidation) ? That's interesting, though a little creepy (the list gets very large as they appear to be keying on the SQL itself ?) Then they have the invalidation flow along all the relationships - maybe that's OK, but maybe it leads to a lot of excessive invalidation. Also they have a notion of how to avoid a certain kind of race condition there, caching ``None`` instead of just deleting, but it's not clear to me how that helps - you really need a version ID there if you want to prevent that particular condition (else thread 1 puts None, thread 2 sees None, OK we'll put a new one there, thread 3, which started before all of them, doesn't see the None, puts stale data in).Really if you're caching SQL queries and such, you really, really should be doing little to no modification of cached data - this library makes it seem "easy" which it's not.

评论 #3327170 未加载

jmoironover 13 years ago

full disclosure: johnny-cache authorThe top commenter in OP gave a great rundown of these projects and their evaluation of them at YCharts at a Django NYC meetup a month-ish ago; I'm sure his slides are available on the nyc django site somewhere.All of these projects "automatically" manage cache for querysets, but they do it different ways, and can be susceptible to poor performance under different usage patterns.From what I can tell, JC adds the lowest amt of overhead to cache misses and hits, and uses the simplest (it's mildly sophisticated, but still straightforward) management algorithm. It's the only one that works fine when using UPDATE queries that do not mention row ids, and (as a result) is the one that most greedily invalidates on writes.The others are fine projects run by smart people, and depending on your site's situation, I'd recommend some of them over johnny-cache. It's a good idea to evaluate them all, as they certainly did at YCharts (his section on JC was very accurate), and as OP seems to have done.

评论 #3327110 未加载

ceolover 13 years ago

Thanks so much for linking to this post. I was just thinking how I would go about caching QuerySets for a Django project.

5 comments

justin_vanwover 13 years ago

评论 #3326751 未加载

评论 #3327149 未加载

评论 #3328786 未加载

fdintinoover 13 years ago

评论 #3327157 未加载

zzzeekover 13 years ago

评论 #3327170 未加载

jmoironover 13 years ago

评论 #3327110 未加载

ceolover 13 years ago

Thanks so much for linking to this post. I was just thinking how I would go about caching QuerySets for a Django project.

How to Add Django Database Caching in 5 Minutes

5 comments

How to Add Django Database Caching in 5 Minutes

5 comments