It's great to see the wealth of data we have thanks to open-source being used for such studies, and we should have more.
More concretely, I can see why actual benefit could be either greater or smaller than the reported result:<p><i>Greater</i>: One of the advantages of types is that they impose a certain discipline and organization on the code. This global quality may have a significant impact that isn't measured in this study, which focuses on a very local effect, and so the actual effect may be significantly greater.<p><i>Smaller</i>: Not all bugs are created equal, and there's actually a huge variability in both the effort bugs require to fix and in their impact on total product quality. If the 15% reduction is mostly in "cheap" bugs, then the actual effect may be significantly smaller.<p>We should try to create a taxonomy of bugs, classified by kind, domain, project size, cost to fix and impact. This would help us get a better picture of the overall effect of various techniques.