“This complexity leads to brittle architectures, where seemingly minor changes can significantly reduce training speed, or render models untrainable.”<p>I am by no means an expert and I can’t verify the authors’ claims about reduced speed and untrainability, but this reflects an impression I’ve been having on the papers I read and review. The field of ML research is moving so fast that people don’t even take time anymore to explain the design decisions behind their architectures. It’s basically “we got nice results, and here is the architecture of the model” (proceeds to show a figure with a hundred coloured blocks connected together in some seemingly random complex way).<p>It used to be that such a thing would get backlash from reviewers, and they would require you to actually justify the design. I don’t see that anymore. The problem with this for me is that we fail to build a nice, crisp understanding of the effects of each design decision in the final outcomes, which hurts the actual “science” of it. It also opens up the field for bogus and unreproducible claims.<p>But at least other people are picking up on the thread and doing that in follow-up papers, which is good.