Don't mock machine learning models in unit tests

76 点作者 7d7n大约 1 年前

14 条评论

dangrossman大约 1 年前

I was expecting an article about side effects of hurting an LLM's feelings in tests.

评论 #39538398 未加载

评论 #39535380 未加载

评论 #39535921 未加载

评论 #39536980 未加载

评论 #39536426 未加载

评论 #39541599 未加载

评论 #39537821 未加载

评论 #39540946 未加载

评论 #39535777 未加载

necovek大约 1 年前

On a more serious note, the author is describing a scenario where mocks are generally not useful, ML or not: never mock the code that is under your control if you can help it.Also, any test that calls out to another "function" (not necessarily a programming language function) is more than a unit test, and is usually considered an "integration" test (it tests that the code that calls out to something else is written properly).In general, an integration point is sufficiently well covered if the logic for the integration is tested. If you properly apply DI (Dependency Inversion/Injection), replacing external function with a fake/mock/stub implementation allows the integration point to be sufficiently tested, depending on the quality of fake/mock/stub.If you really want to test unpredictable output (this also applies to eg. performance testing), you want to introduce acceptable range (error deltas), and limit the test to exactly the point that's unpredictable by structuring the code appropriatelly. All the other code and tests should be able to trust that this bit of unpredictable behaviour is tested elsewhere and be able to test different outputs.

评论 #39537234 未加载

评论 #39539524 未加载

评论 #39537584 未加载

politelemon大约 1 年前

A better phrasing would be, ML models are better suited for integration testing rather than unit testing. Since the test is no longer running in isolation.

pooper大约 1 年前

I don't do any fancy research but for my simple stuff, I've mostly given up on the idea of unit tests. I still use them for some things and they totally help in places where the logic is wonky or unintuitive but I see my unit tests as living documentation of requirements more than actual tests. Things like make sure you get new tokens if your current ones will expire in five minutes or less.> Don’t test external libraries. We can assume that external libraries work. Thus, no need to test data loaders, tokenizers, optimizers, etc.I disagree with this. At $work I don't have all day to write perfect code. Neither does anyone else. I don't mock/substitute http anymore. I directly call my dependencies. If they fail, I try things out manually. If something goes wrong, I send them a message or go through their code if necessary.Life is too short to be dogmatic about tests. Do what works for your (dysfunctional) organization.

评论 #39536923 未加载

评论 #39536955 未加载

hiddencost大约 1 年前

The author is not describing unit tests.The concepts the author is looking for are integration tests and release evals.

评论 #39535838 未加载

评论 #39535821 未加载

sarusso大约 1 年前

You might also want to fix all random seeds so that you can check for exact numerical values and not “convergence” or similar concepts.

noduerme大约 1 年前

>> Software : Input Data + Handcrafted Logic = Expected OutputMachine Learning : Input Data + Expected Output = Learned LogicLet me stop you right there. No logic is learned in this process.[edit] Also, the LLM is inductive, not deductive. That is, it can only generalize based on observable facts, not universalize based on logical conditions. This also goes to the question of whether a logical statement itself can ever be arrived at by induction, such as whether the absence of life in the observable universe is a problem of our ability to observe or a generally applicable phenomenon. But for the purpose of LLMs we have to conclude that no, it can't find logic by reducing a set of outcomes, regardless of the size of the set. All it can do is find a set of incomprehensible equations that seem to fit the set in every example you throw at it. That's not logic, it's a lens.

评论 #39536009 未加载

评论 #39544949 未加载

Hackbraten大约 1 年前

> Avoid loading CSVs or Parquet files as sample data. (It’s fine for evals but not unit tests.) Define sample data directly in unit test code to test key functionalityHow does it matter whether I inline my test data inside the unit test code, or have my unit test code load that same data from a checked-in file instead?

评论 #39536803 未加载

javier_e06大约 1 年前

The problem when mocks happen when all your unit test passes and the program fails on integration. The mocks are a pristine place where your library unit test works like a champ. Bad mocks or bad library? Or both. Developers are then sent to debug the unit test... overhead. I don't much about ML but I would think that they should follow some rules resembling judicial rules of precedence and witness cross-examination techniques.

elif大约 1 年前

Depends on the model honestly. If you include gpt model in your unit tests, be prepared to run them over and over again until you get a pass, or chase your own shadow debugging non-errors.

posix_monad大约 1 年前

Prediction for the future:- Algebraic Effects will land in mainstream languages, in the same way that anonymous lambda functions have- This will render "mocks" pointless

评论 #39536521 未加载

yawpitch大约 1 年前

Oh dear (possibly artificial) god, have they developed _feelings_?!?Sorry… with a title like that, I couldn’t help myself.

mindcrime大约 1 年前

Not intended as a comment on the current TFA, but based on observing many conversations on the topic of unit testing in the past, I believe this to be a true statement:"If you're ever lost in a wilderness setting, far from civilization, and need to be rescued, just start talking about unit testing. Somebody will immediately show up to tell you that you're doing it wrong."

mellutussa大约 1 年前

> never mock the code that is under your control if you can help it.This is just nonsense. It'd effectively mean you only had integration tests. While they are absolutely fantastic they are too slow during development.

评论 #39537719 未加载