As an ML engineer, I’ve found MLFlow to be really a disastrously bad way to look at the problem. It’s something that managers or executives buy into without understanding it, and my team of engineers (myself included) have hated it.<p>There are many feature specific reasons, but the biggest thing is that reproduction of experiments needs to be synonymous with code review and the identically same version control system you use for other code or projects.<p>This way reproducibility is a genuine constraint on deployment and deployment of an experiment, whether just training a toy model, incorporating new data, or really launching a live experiment, is conditional on reproducibility and code review of the code, settings, runtime configs, etc., that embodies it totally.<p>This is much better solved with containers, so that both runtime details and software details are located in the same branch / change set, and a full runtime artifact like a container can be built from them.<p>Then deployment is just whatever production deployment already is, usually some CI tool that explains where a container (built from a PR of your experiment branch for example) is deployed to run, along with whatever monitoring or probe tracking tools you already use.<p>You can treat experiments just like any other deployable artifact, and monitor their health or progress exactly the same.<p>Once you think of it this way, you realize that tools like ML Flow are <i>categorically</i> the wrong tool for the job, almost by definition, and they exist mostly just to foster vendor lock-in or support reliance on some commercial entity, in this case Databricks.