Can anyone recommend any good resources on double descent (DD; the phenomenon whereby overparameterized estimators achieve better error reduction than parameterized ones)?<p>DD seems like voodoo to me. Research has shown that DL models can reproduce even random noise training data perfectly. So DD seems like an assertion that somehow an overparameterized model can contain more information about a source than a sample dataset, like "free information." It seems like it violates the data processing inequality or something.<p>Is there just confusion about this phenomenon in general, or are there good resources on this?