I know that Linus hates unit tests, but these kinds of scenarios are perfect for regression tests. The investigation of the issue took so long, that you'd really want to spend those extra few hours writing the test to save yourself investigation in the future. The patch doesn't include any comment about a race condition in the actual code, so let's assume that all the knowledge is more-or-less lost and forgotten in 12 months or so.
The saying I heard in my embedded programming class was:<p>“There are two hard things in programming: naming things, cache invalidation, and off-by-one-errors.”
Great read, although I don't think I have enough knowledge to fully appreciate the details of the article.<p>I had similar problem before, and I didn't even notice that the cache was not cleared and had worked on pointless hypothesis until a coworker pointed out that there was a case where the kernel didn't evict the page cache. It's very hard to even detect that problem.<p>Twitter's Engineering blog has several interesting posts recently btw. Kudos to them.
I tried to debug a very similar-sounding issue a couple of years back. Many GB used in dentrys, not shrinking when asked, no obvious cause.<p>Sadly I have no kernel hacking skills, don’t even know what a dentry is. Kudos to the author.