This is a useful checklist, it reminded me of a recent topic of 'Whether AI is the end of Scientific Method' on Babbage from Economist Radio[1]<p>The arguments were that in ML/DL, experiments are run at large scale without hypothesis, with radical empiricism in a trial and error fashion
which is against Scientific Method i.e. Hypothesis, Experiment, Observation, Theory.<p>[1]<a href="https://soundcloud.com/theeconomist/babbage-ai-the-end-of-the" rel="nofollow">https://soundcloud.com/theeconomist/babbage-ai-the-end-of-th...</a>
This checklist has some flaws. Most interesting results in ML have no proof.<p>For example, can you give a proof of superconvergence? What’s the exact learning rate that causes it, and why? Did you know that you can often get away with a high learning rate for a time, and then divergence happens? What’s the proof of that?<p>Give a proof that under all circumstances and wind conditions, lowering your airplane’s flaps by 5 degrees will help you land safely.<p>Also, what about datasets that you’re not allowed to release? I personally despise such datasets, but I found myself in the ironic position of having a 10GB dataset dropped in my lap that was a perfect fit for my current project. Unfortunately it wasn’t until after training was mostly complete that we realized we hadn’t asked whether the author was comfortable releasing it, and indeed the answer was no. So what to do? Just don’t talk about it?<p>I guess the list is good as a set of ideals to aim for. I just wish some consideration was given that you often can’t meet all of those goals.<p>Most of OpenAI's work would be excluded by this checklist. I don't think anyone would argue that OpenAI doesn't do important work, and that their results are in some sense reproducible.
This gives a little more context:<p><a href="https://www.nature.com/articles/d41586-019-03895-5" rel="nofollow">https://www.nature.com/articles/d41586-019-03895-5</a>
This is aimed at production or critical applications, though, not forefront or blue-sky research. In the former case, we need a shared & agreed framework to make sure everyone from everywhere gets statistically comparable results, with this checklist helping us in that sense. In the latter case, it is open field and we are looking for agreeable results approximation before a method, which will be devised later to fit concordant results.