On a more serious note, the author is describing a scenario where mocks are generally not useful, ML or not: never mock the code that is under your control if you can help it.<p>Also, any test that calls out to another "function" (not necessarily a programming language function) is more than a unit test, and is usually considered an "integration" test (it tests that the code that calls out to something else is written properly).<p>In general, an integration point is sufficiently well covered if the logic for the integration is tested. If you properly apply DI (Dependency Inversion/Injection), replacing external function with a fake/mock/stub implementation allows the integration point to be sufficiently tested, depending on the quality of fake/mock/stub.<p>If you really want to test unpredictable output (this also applies to eg. performance testing), you want to introduce acceptable range (error deltas), and limit the test to exactly the point that's unpredictable by structuring the code appropriatelly. All the other code and tests should be able to trust that this bit of unpredictable behaviour is tested elsewhere and be able to test different outputs.
A better phrasing would be, ML models are better suited for integration testing rather than unit testing. Since the test is no longer running in isolation.
I don't do any fancy research but for my simple stuff, I've mostly given up on the idea of unit tests. I still use them for some things and they totally help in places where the logic is wonky or unintuitive but I see my unit tests as living documentation of requirements more than actual tests. Things like make sure you get new tokens if your current ones will expire in five minutes or less.<p>> Don’t test external libraries. We can assume that external libraries work. Thus, no need to test data loaders, tokenizers, optimizers, etc.<p>I disagree with this. At $work I don't have all day to write perfect code. Neither does anyone else. I don't mock/substitute http anymore. I directly call my dependencies. If they fail, I try things out manually. If something goes wrong, I send them a message or go through their code if necessary.<p>Life is too short to be dogmatic about tests. Do what works for your (dysfunctional) organization.
>> <i>Software : Input Data + Handcrafted Logic = Expected Output<p>Machine Learning : Input Data + Expected Output = Learned Logic</i><p>Let me stop you right there. No logic is learned in this process.<p>[edit] Also, the LLM is <i>inductive</i>, not <i>deductive</i>. That is, it can only generalize based on observable facts, not universalize based on logical conditions. This also goes to the question of whether a logical statement itself can ever be arrived at by induction, such as whether the absence of life in the observable universe is a problem of our ability to observe or a generally applicable phenomenon. But for the purpose of LLMs we have to conclude that no, it can't <i>find logic</i> by reducing a set of outcomes, regardless of the size of the set. All it can do is find a set of incomprehensible equations that <i>seem to fit the set in every example you throw at it</i>. That's not logic, it's a lens.
> Avoid loading CSVs or Parquet files as sample data. (It’s fine for evals but not unit tests.) Define sample data directly in unit test code to test key functionality<p>How does it matter whether I inline my test data inside the unit test code, or have my unit test code load that same data from a checked-in file instead?
The problem when mocks happen when all your unit test
passes and the program fails on integration.
The mocks are a pristine place where your library unit test works like a champ.
Bad mocks or bad library? Or both.
Developers are then sent to debug the unit test... overhead.
I don't much about ML but I would think that they should follow some rules
resembling judicial rules of precedence and witness cross-examination techniques.
Depends on the model honestly. If you include gpt model in your unit tests, be prepared to run them over and over again until you get a pass, or chase your own shadow debugging non-errors.
Prediction for the future:<p>- Algebraic Effects will land in mainstream languages, in the same way that anonymous lambda functions have<p>- This will render "mocks" pointless
Not intended as a comment on the current TFA, but based on observing many conversations on the topic of unit testing in the past, I believe this to be a true statement:<p><i>"If you're ever lost in a wilderness setting, far from civilization, and need to be rescued, just start talking about unit testing. Somebody will immediately show up to tell you that you're doing it wrong."</i>
> never mock the code that is under your control if you can help it.<p>This is just nonsense. It'd effectively mean you only had integration tests. While they are absolutely fantastic they are too slow during development.