Here's my take.<p>If you could cache <i>all</i> the inputs that go into a <i>pure</i> function, you'd never need to invalidate the cache. Your cache becomes just a lazily-created tabulation of your function - which is fixed, as a map is just another representation of a mathematical function.<p>So cache invalidation happens only where your cache key isn't containing all the parameters of the cached computation - your cache isn't a mathematical function. Now the problem is, in real world, it's almost impossible to fully capture <i>all</i> the parameters of a computation in a way that gives you a usable cache (one that trades time/CPU for memory). So we need to track the possible changes in the "hidden state", and invalidate a cache entry if we detect such state. That's where all the complexity sits.<p>For instance, take a problem of a function countWords(file), that counts words in a file. How would we cache its results? If we key by file name, the cache will be wrong when the file gets changed. Keying by the hash of file contents is out of the question, because hashing is at least O(n) with respect to file size, just as countWords(), so we gain nothing. Keying by file name + modification date will fail if the clock changes, or file gets updated faster than the time resolution we're using. Not to mention, we'll have false positives if the date changes but the contents stay the same.<p>But assuming that the filesystem is sane and nobody messes with the clock, what else could happen? The definition of what a "word" is can change, making the entire cache bad. We may have found a bug in countWords(), updated the code, and now the cached results are also bad. The contents of the cache may have been changed externally (e.g. if shared by multiple functions, or multiple processes using countWords(), with different definition of "word" or with partially deployed bugfix). The code of the cache itself may have been updated to fix a storage bug there. Etc.<p>At some point you have to decide, which of those issues you care about and which you ignore - and for those you care about, how to monitor for changes, propagate them to the cache, and do it fast enough that none of the cache users will see invalid data in between detection and propagation. That's how cache invalidation is hard.