Don't mock machine learning models in unit tests

76 pointsby 7d7nabout 1 year ago

14 comments

dangrossmanabout 1 year ago

I was expecting an article about side effects of hurting an LLM's feelings in tests.

评论 #39538398 未加载

评论 #39535380 未加载

评论 #39535921 未加载

评论 #39536980 未加载

评论 #39536426 未加载

评论 #39541599 未加载

评论 #39537821 未加载

评论 #39540946 未加载

评论 #39535777 未加载

necovekabout 1 year ago

On a more serious note, the author is describing a scenario where mocks are generally not useful, ML or not: never mock the code that is under your control if you can help it.Also, any test that calls out to another "function" (not necessarily a programming language function) is more than a unit test, and is usually considered an "integration" test (it tests that the code that calls out to something else is written properly).In general, an integration point is sufficiently well covered if the logic for the integration is tested. If you properly apply DI (Dependency Inversion/Injection), replacing external function with a fake/mock/stub implementation allows the integration point to be sufficiently tested, depending on the quality of fake/mock/stub.If you really want to test unpredictable output (this also applies to eg. performance testing), you want to introduce acceptable range (error deltas), and limit the test to exactly the point that's unpredictable by structuring the code appropriatelly. All the other code and tests should be able to trust that this bit of unpredictable behaviour is tested elsewhere and be able to test different outputs.

评论 #39537234 未加载

评论 #39539524 未加载

评论 #39537584 未加载

politelemonabout 1 year ago

A better phrasing would be, ML models are better suited for integration testing rather than unit testing. Since the test is no longer running in isolation.

pooperabout 1 year ago

I don't do any fancy research but for my simple stuff, I've mostly given up on the idea of unit tests. I still use them for some things and they totally help in places where the logic is wonky or unintuitive but I see my unit tests as living documentation of requirements more than actual tests. Things like make sure you get new tokens if your current ones will expire in five minutes or less.> Don’t test external libraries. We can assume that external libraries work. Thus, no need to test data loaders, tokenizers, optimizers, etc.I disagree with this. At $work I don't have all day to write perfect code. Neither does anyone else. I don't mock/substitute http anymore. I directly call my dependencies. If they fail, I try things out manually. If something goes wrong, I send them a message or go through their code if necessary.Life is too short to be dogmatic about tests. Do what works for your (dysfunctional) organization.

评论 #39536923 未加载

评论 #39536955 未加载

hiddencostabout 1 year ago

The author is not describing unit tests.The concepts the author is looking for are integration tests and release evals.

评论 #39535838 未加载

评论 #39535821 未加载

sarussoabout 1 year ago

You might also want to fix all random seeds so that you can check for exact numerical values and not “convergence” or similar concepts.

noduermeabout 1 year ago

>> Software : Input Data + Handcrafted Logic = Expected OutputMachine Learning : Input Data + Expected Output = Learned LogicLet me stop you right there. No logic is learned in this process.[edit] Also, the LLM is inductive, not deductive. That is, it can only generalize based on observable facts, not universalize based on logical conditions. This also goes to the question of whether a logical statement itself can ever be arrived at by induction, such as whether the absence of life in the observable universe is a problem of our ability to observe or a generally applicable phenomenon. But for the purpose of LLMs we have to conclude that no, it can't find logic by reducing a set of outcomes, regardless of the size of the set. All it can do is find a set of incomprehensible equations that seem to fit the set in every example you throw at it. That's not logic, it's a lens.

评论 #39536009 未加载

评论 #39544949 未加载

Hackbratenabout 1 year ago

> Avoid loading CSVs or Parquet files as sample data. (It’s fine for evals but not unit tests.) Define sample data directly in unit test code to test key functionalityHow does it matter whether I inline my test data inside the unit test code, or have my unit test code load that same data from a checked-in file instead?

评论 #39536803 未加载

javier_e06about 1 year ago

The problem when mocks happen when all your unit test passes and the program fails on integration. The mocks are a pristine place where your library unit test works like a champ. Bad mocks or bad library? Or both. Developers are then sent to debug the unit test... overhead. I don't much about ML but I would think that they should follow some rules resembling judicial rules of precedence and witness cross-examination techniques.

elifabout 1 year ago

Depends on the model honestly. If you include gpt model in your unit tests, be prepared to run them over and over again until you get a pass, or chase your own shadow debugging non-errors.

posix_monadabout 1 year ago

Prediction for the future:- Algebraic Effects will land in mainstream languages, in the same way that anonymous lambda functions have- This will render "mocks" pointless

评论 #39536521 未加载

yawpitchabout 1 year ago

Oh dear (possibly artificial) god, have they developed _feelings_?!?Sorry… with a title like that, I couldn’t help myself.

mindcrimeabout 1 year ago

Not intended as a comment on the current TFA, but based on observing many conversations on the topic of unit testing in the past, I believe this to be a true statement:"If you're ever lost in a wilderness setting, far from civilization, and need to be rescued, just start talking about unit testing. Somebody will immediately show up to tell you that you're doing it wrong."

mellutussaabout 1 year ago

> never mock the code that is under your control if you can help it.This is just nonsense. It'd effectively mean you only had integration tests. While they are absolutely fantastic they are too slow during development.

评论 #39537719 未加载