Disclaimer: I don't have a lot of comments on TDD as a whole, other than that most software I write is very exploratory (I'll throw away the first 3-10 drafts, and by no means does that mean it takes 3-10x as long to write), and the best language for me for that exploration is often the language of the actual software I'm writing. TDD, in that environment, doesn't seem very applicable since the whole point is that we don't know what's actually possible (or, when it's possible, if the tradeoffs are worth it).<p>The author has a lot of opinions about testing though which conflict with what I've found to work in even that sort of dynamic environment. Their rationale makes sense on the surface (e.g., I've never seen a "mock"-heavy [0] codebase reap positive net value from its tests), but the prescription for those observed problems seems sub-optimal.<p>I'll pick on one of those complaints to start with, IMO the most egregious:<p>> Now, you change a little thing in your code base, and the only thing the testing suite tells you is that you will be busy the rest of the day rewriting false positive test cases.<p>If changing one little thing results in a day of rewriting tests, then either (a) the repo is structured such that small functional changes affect lots of code (which is bad, but it's correct that you'd therefore have to inspect all the tests/code to see if it actually works correctly afterward), or (b) the tests add coupling that doesn't exist otherwise in the code itself.<p>I'll ignore (a), since I think we can all agree that's bad (or at least orthogonal to testing concerns). For (b) though, that's definitely a consequence of "mock"-heavy frameworks.<p>Why?<p>The author's proposal is to just test observable behavior of the system. That's an easy way to isolate yourself from implementation details. I don't disagree with it, and I think the industry (as I've seen it) discounts a robust integration test suite.<p>What is it about "unit" tests that causes problems though? It's that the things you're testing aren't very well thought through or very well abstracted in the middle layers. Hear me out. TFA argues for integration tests at a high level, but if you (e.g.) actually had to implement a custom sorting function at your job would you leave it untested? Absolutely not. It'd be crammed to the gills with empty sets, brute-force checking every permutation of length <20, a smattering of large inputs, something involving MaxInt, random fuzzing against known-working sorting algorithms, and who knows what else the kids are cooking up these days.<p>Moreover, almost no conceivable change to the program would invalidate those tests incorrectly. The point of a sorting algorithm is to sort, and it should have some performance characteristics (the reason you choose one sort over another). Your tests capture that behavior. As your program changes, you either say you don't need that sort any more (in which case you just delete the tests, which is O(other_code_deleted)), or you might need a new performance profile. In that latter case, the only tests that are broken are associated with that one sorting function, and they're broken _because_ the requirements actually changed. You still satisfy O(test_changes) <= O(code_changes); the thing the author is arguing doesn't happen because of mocks.<p>Let's go back to the heavily mocked monstrosities TFA references. The problem isn't "unit" testing. Integration tests (the top of a DAG), and unit tests (like our sorting example, the bottom of a DAG) are easy. It's the code in between that gets complicated, and there might be a lot of it.<p>What do we do then?<p>At a minimum, I'd personally consider testing the top and bottom of your DAG of code. Even without any thought leadership or whatever garbage we're currently selling, it's easy to argue that tests at those levels are both O(other_code_written) in cost and also very valuable. At a high level (TFA's recommendation), the tests are much cheaper than the composite product, and you'd be silly not to include them. At a low level (truly independent units, like the "sorting" case study), you'd also be silly not to include them, since your developers are already writing those tests to check if it works as they implement the feature in the first place, and the maintenance cost of the tests is both proportional to the maintenance cost of the code being tested and extremely valuable in detecting defects in that code (recall that bugs are exponentially more expensive to fix the further down the pipeline the propogate before being triaged).<p>Addressing the bottom of your DAG is something the article, in some sense, explicitly argues against. They're arguing against the inverted pyramid model you've seen for testing. That seems short-sighted. Your developers are already paying approximately the cost of writing a good test when they personally test a sorting function they're writing, and that test is likely to be long-lived and useful; why throw that away? More importantly, building on shaky foundations is much more expensive than most people give it credit for. If your IDE auto-complete suggests a function name that says it does the right thing and accepts the arguments you're giving it, you get an immediate 10x in productivity if that autocomplete is always right. Wizards in a particular codebase (I've been that wizard in a few, my current role as well; that isn't a derogatory assessment of "other" people) can always internalize the whole thing and immediately know the right patterns, but for everyone else with <2yrs of experience in your company in particular (keep in mind that average silicon valley attrition is 2-3yrs), a function doing what it says it's going to do is a godsend to productivity.<p>Back to the problem at hand though. TFA says to integration test, and so do I. I also say to test your "leaf" code in your code DAG, since it's about the same cost and benefit. What about the shit in between?<p>In a lot of codebases I've seen, I'd say to chock it up as a lost cause and test both the integration stuff (that TFA suggest) and also any low-level details (the extra thing I'm saying is important). Early in my career, I was implementing some CRUD feature or another and explicitly coached (on finding that the reason implementation was hard was a broken function deep in the call-stack) to do the one-liner fix to make my use case work instead of the ten-liner to make the function actually correct and the 1000-liner to then correct every caller. I don't think they were wrong in giving that advice. I'm sad that the code was in a state where that was reasonable advice.<p>If you're working on newer projects though (or plan to be at a place for awhile and have the liberty to do some cleanup with every new feature (a pattern I wholly endorse and which has served me very well personally)), it's worth looking at that middling code and figuring out why it's so hard to work with. 99% of the time, the reason mocks look attractive isn't because they're the only solution. It's because they're the only solution that makes sense once you've already tied your hands. You don't need something to "unit" test the shutdown handler; you need something to test the total function which processes inputs and outputs and is called by the shutdown handler. You don't need to "unit" test a UI page that requires 3 different databases to produce any output; you need to unit test the functions which turn that output into that UI page (ideally, without mocks, since although those ostensibly do the same thing they usually add an extra layer of complexity and somehow break all your tests), and for something that messy you might even just need an "integration" test around that UI page asserting that it renders approximately correctly.<p>What else? People sell all kinds of solutions. "Functional Programming" or "OOP" or whatever. Programming is imperative when you execute it, and the right representation for the human reader varies from problem to problem. I don't have any classes to sell or methodologies to recommend. I do strongly recommend taking a very close look at the abstractions you've chosen though. I've had no problem deleting 90% of them at new jobs, making the code faster, more correct, and easier to modify (I usually do so as part of a "coup," fixing things slowly with each new feature). When every new feature deletes code, the benefits tend to snowball. I see my colleagues doing that now to code I recently wrote, and I'd personally do it again.<p>[0] People typically mean one of two things when they say they're "mocking" a dependency. The first is that they want a function to be "total" and have reasonable outputs for all possible inputs. They'll mock out many different interface implementations (or equivalent blah blah blah in your favorite language) to probe that behavior and ensure that your exponential backoff routine behaves reasonably when the clock runs backward, when 1000 of them are executed simultaneously, and whatnot. That tends to make for expensive tests, so I tend to see it reserved for risky code in teams which are risk-averse, but it's otherwise very good at its job. The other case is using some sort of "mock" library which lets you treat hard dependencies as soft dependencies and modify class instantiation, method return values, and all sorts of things to fit the test you're trying to write. This latter case is much more common, so it's what I'm referring to in a "heavily mocked" codebase. It's a powerful tool which could be used for good, but IME it's always overused enough that it would be better if it didn't exist.