Or you can use a database where join performance isn't pathological, and scale out by adding read slaves.<p>Caching is hard. Caching as an automated 'layer' is much harder. If it were possible to cache in a general way <i>databases would do it already</i>. Adding a 'caching layer' is opening a gate into hell. The things is, at first it will be fine. Thousands of engineer hours and hundreds of subtle bugs later, you'll (if you're wise) realize that opening that door let out many demons, you just didn't have the eyes to see them at the time.<p>"There are only two hard things in Computer Science: cache invalidation and naming things." -- Phil Karlton
The timing of this post is funny, as I just got finished reworking our fork of django-cache-machine. As the post points out, the limitation of Cache Machine as it is currently built is that only objects which are already within a queryset can invalidate that queryset. This is fine for selects on primary keys, but beyond that the invalidation logic is incomplete.<p>My changes ( <a href="https://github.com/theatlantic/django-cache-machine/" rel="nofollow">https://github.com/theatlantic/django-cache-machine/</a> ) inspect the ORDER BY clauses (if there's a limit or offset) and WHERE constraints in the query and saves the list of "model-invaliding columns" (i.e., columns which, when changed, should broadly invalidate queries on the model) in a key in the cache. It also associates these queries with a model-flush list. Then, in a pre_save signal, it checks whether any of those columns have changed and marks the instance to invalidate the associated model flush list when it gets passed into the post_save signal. We have these changes up on a few of our sites and, if all goes well, we're looking to move the invalidation at the column level to make the cache-hit ratio even higher.
Took a look at the Django Cache Machine they mention, at the invalidation scheme (<a href="http://jbalogh.me/projects/cache-machine/#cache-manager" rel="nofollow">http://jbalogh.me/projects/cache-machine/#cache-manager</a>). It stores a "flush list", linking objects to their originating, cached queries. Interesting, though looks like something that could get out of hand quickly - are they storing the "flush list" itself in the cache (else how do all nodes learn of the invalidation) ? That's interesting, though a little creepy (the list gets very large as they appear to be keying on the SQL itself ?) Then they have the invalidation flow along all the relationships - maybe that's OK, but maybe it leads to a lot of excessive invalidation. Also they have a notion of how to avoid a certain kind of race condition there, caching ``None`` instead of just deleting, but it's not clear to me how that helps - you really need a version ID there if you want to prevent that particular condition (else thread 1 puts None, thread 2 sees None, OK we'll put a new one there, thread 3, which started before all of them, doesn't see the None, puts stale data in).<p>Really if you're caching SQL queries and such, you really, really should be doing little to no modification of cached data - this library makes it seem "easy" which it's not.
full disclosure: johnny-cache author<p>The top commenter in OP gave a great rundown of these projects and their evaluation of them at YCharts at a Django NYC meetup a month-ish ago; I'm sure his slides are available on the nyc django site somewhere.<p>All of these projects "automatically" manage cache for querysets, but they do it different ways, and can be susceptible to poor performance under different usage patterns.<p>From what I can tell, JC adds the lowest amt of overhead to cache misses and hits, and uses the simplest (it's mildly sophisticated, but still straightforward) management algorithm. It's the only one that works fine when using UPDATE queries that do not mention row ids, and (as a result) is the one that most greedily invalidates on writes.<p>The others are fine projects run by smart people, and depending on your site's situation, I'd recommend some of them over johnny-cache. It's a good idea to evaluate them all, as they certainly did at YCharts (his section on JC was very accurate), and as OP seems to have done.