One of the most important lessons I've learned in my career is that if a common problem has existed for a long time, a simple solution has probably been hiding in plain sight for a long time somewhere you haven't thought to look. I don't know anything about productivity research, but the fundamentals of defect density were figured out decades ago [1].<p>> (a) there’s no difference between test-first and test-after but (b) interleaving short bursts of coding and testing is more effective than working in longer iterations<p>I'm glad you quoted this study because it's a perfect example of a conclusion that should have been the starting point of a more interesting experiment. Interleaving code and test is known to be the most impactful factor. TDD and TRL don't differ in any of the ways that account for the vast majority of defects. Therefore of course the difference should be small, and of course shorter iterations should be better.<p>[1] <a href="https://www.slideshare.net/AnnMarieNeufelder/the-top-ten-things-that-have-been-proven-to-effect-software-reliability" rel="nofollow">https://www.slideshare.net/AnnMarieNeufelder/the-top-ten-thi...</a>
> <i>Assume for a second that a study of deep relevance to practitioners is replicated, and its conclusions accepted by the research community. It has large sample sizes, a beautiful research plan, a statistically sound system for controlling for various other explanatory factors</i><p>As a multi-decade practitioner and manager of software engineering teams, I would certainly be interested in what the best of the empirical research has to say. I think it's important to always remain open to good new ideas wherever they may come from—strong opinions, loosely held, as they say.<p>That said, I don't believe that statistically significant results can be found that will overturn my own instincts and judgement on any specific project to which I am dedicated. The reason for this is threefold: 1) the universe of software and goals we pursue with is astronomically large 2) competence in software engineering depends on the combination of personal aptitudes and mindsets combined with years of practice and 3) measuring outcomes in software engineering across diverse projects is all but impossible. In other words, you can't equate tools, you can't equate projects, and most of all you can't equate people.<p>At the end of the day, success in software engineering comes from relentless focus on the specific goals at hand. One must be inherently curious and have a craftsperson's mentality about acquiring technical skill, but never become religious about methodology. This requires continuous first-principles thinking targeted at <i>specifics</i>. At the end of the day, two expert practitioners could propose unorthodox and diametrically opposed approaches to the same problem, and they would still dramatically outperform a lesser skilled journeyman who attempted to follow every best practice.<p>Empirical studies and the scientific method in general work fantastically well for uncovering the rules and inner workings of the natural world, but software is the creation of logical systems purely by human minds which is an entirely different challenge—there's just not enough evidence to draw on. I suspect results will be at least a couple orders of magnitude softer than sociology, and that probably won't sit well with the type of personality attracted to software in the first place.
> Assume for a second that a study of deep relevance to practitioners is replicated, and its conclusions accepted by the research community. It has large sample sizes, a beautiful research plan, a statistically sound system for controlling for various other explanatory factors; whatever it is that we need to proclaim it to be a good study.<p>Has there ever been an empirical software study, that would have a beautiful research plan, sound statistical analysis, large sample size, and that would also have been replicated? Even one?
For personal software development projects of course there are other factors that matter beyond finding the theoretically optimal X and Y. From a management perspective in business those factors might also matter. You want to get the best performance out of your team but you're not going to do that if people keep quitting due to an unpleasant work environment.<p>As far as advocacy goes though - when someone is recommending what <i>other people</i> should do - I think it's very different if there is relevant evidence and it doesn't back up the advocated position. It's even more different if there is relevant evidence that positively undermines the advocated position. There are snake oil salesmen in this industry and some of them will call you names if you don't follow their pet process. But if what they're peddling isn't backed by the evidence or even contradicts the evidence then they should be called out and their audience should probably be sceptical about anything else those same salesmen are selling as well. The old joke about someone finding it hard to believe something when their continued employment depends on its falsehood is as relevant as ever.
"Empirical software research" could mean a bunch of different things. This article is about studying people writing software, not about software used for research in empirical sciences, and not about research into computer science.<p>I'm confident the answer to what follows from that is "nothing yet" based on various conference talks. Studying developers (or in a worse case students) writing software doesn't seem to be an effective way of working out how to write software better/faster/whatever.
I obviously have not looked at all (or really much) of this research, but I have always felt that software development is so context dependent that drawing generalised conclusions is just an impossible ask in the first place. Even if you did, there would be so many exceptions that in practice it will come down to "use your experience to assess the context and then decide".<p>The example of TDD : I've done significant pieces of work both with and without TDD. In some scenarios its a huge impediment; the actual complexity of the internal software is fairly low and the but the testing complexity is high (many complex stateful dependencies that are hard to control). In that case, I spent 80% of my time writing the tests and far more bugs surfaced in the tests than the code itself.<p>Then in other scenarios there's high internal complexity, low external dependencies / complexity and it's pretty much a no-brainer, TDD is almost the only tractable way to write the code let alone an improvement.<p>Then it's very personal as well. One person will work well with TDD and another will struggle. Dumb things like, is your personal preference in development environment conducive to rapidly running and iterating on tests are probably going to dominate.<p>End result is, I think these studies just can't possibly control all the variables and this is why they either end up in invalid conclusions, too specialised conclusions, or, as Jimmy says, the more rigorous the study the less significant the result are.
I don't know if TDD is effective. It seems effective on my single-person projects, less so on multi-person projects.<p>However, TDD is pretty effective in working with chatGPT. I always tell it to write the tests first.
The author's overthinking it. He cares about productivity—it's just that the effect sizes are too small in these studies to overcome his prior beliefs.
What a lot of words to justify continuing preaching TDD despite no evidence that it's better. (Guess what, it's not worse either, so if you want to personally use it, go for it. Just stop insisting other people should "convert".)<p>Of course if something has a huge impact on my productivity, I want to practice it. Even if it's not fun. There's a lot of denial embedded on this article.