I'm looking to learn about best practices for testing in the context of training large machine learning models.<p>For example, before launching a model to prod, I've commonly seen that teams have a suite of datasets/evals, and a model has to be at least as good as the existing model in production before replacing it.<p>But what about the training pipelines? You can have unit tests for all the individual components, but the analogy of "integration tests" would involve just training a full model, which might be very expensive for large neural networks.<p>One option would be to have a special tiny dataset and tiny model that can be trained fast on CPU. But this would be an imperfect test of the real system. I've also never seen this in practice, so maybe it wouldn't actually be useful.<p>Curious if any practitioners could share their wisdom.