Biologist here.<p>I'm trying to learn a bit about machine learning just to keep my general knowledge fresh.<p>Reading about machine-learning model/package development, I've realised that it must be easy to allow such models to degenerate; to patch and glue and create hidden dependencies that can be hard to reverse-engineer.<p>This reminds me very much of a genome, where functionality has been added over billions of years using whatever inputs were available at a time and producing something good enough at the time.<p>I'm not sure how relevant such analogies are to ML, but it feels like this must be the natural way of things: The code wants to degenerate (path of least resistance), but for the model to be clear--and generalisable--this must be resisted.<p>Do you feel this is fair/accurate?<p>Again, I'm a biologist, not a technical expert. I just found this similarity intriguing and it would be interesting to hear your thoughts on the challenges/opportunities of allowing code to be more like code (and less like a genome, since they're notoriously hard to understand or reverse-engineer).
When applied to the models themselves it's not a particularly helpful analogy. Machine learning models are largely curve fitting in high dimensional spaces. Overfitting, overspecialization, and the like are problems, and you could relate them to ecological notions and selection, but it's not terribly helpful in practice.<p>Where it <i>does</i> apply is to the cascade of dependencies among data sets, code that generates them, and sources of signal that you see in large data platforms.