My initial intuition is there's limitations in test samples that are used, in the sense they only have so much information. At some point overfitting is likely to manifest not in test risk per se, but in random variations over alternate test samples. Eg overfitting would evidence in susceptibility to adversarial regimes not cross validation risk.<p>I've always been skeptical of cross validation based inference though and admit it's a fascinating phenomenon in the paper.<p>It just seems, informationally speaking, to be proposing something akin to free energy: that more data is worse and if you just increase your model complexity you can magically infer truth. It seems more likely to be an error in the inferential paradigm.