I always thought 404 responses would occupy trivial amount of memory on cache since they don't have a body. Ok, there is some overhead which might cause issues if there is too much 404. However, I think some guy who is trying to crawl all of your web site is a bigger problem if your web site is big. You can throttle him based on his IP and user agent but how do you differiante people who has same IP because they are behing a cooparete proxy?<p>I also thought cache poisoning means putting a bad response into a cache by sending a special request so that innocent people will be served malicious response because they hit the cache. I think article uses it in a different context.
tl;dr: it turns out that OS-native caching policies aren't really such a great fit for edge caches, varnish should probably have a specific VCL module for intelligently manage cache poising attacks (even something simple for compactly managing 4XX errors would go a long way).