科技回声

8 条评论

The reason you want to over-parameterize your model is that it protects you from "bad bounce" learning trajectories. You effectively spread out your overfitting risk until it's pretty close to 0.Or at least that's the way I like to think of it.The next step is to better compress the resulting model in a simpler, less computationally costly network.

评论 #21731005 未加载

评论 #21735499 未加载

liaukovv超过 5 年前

"Understanding" part seems rather optimistic.

lostmsu超过 5 年前

Can someone explain what is "interpolation threshold"? Both articles talk about it, but none defines.

评论 #21732261 未加载

评论 #21732137 未加载

gyuserbti超过 5 年前

My initial intuition is there's limitations in test samples that are used, in the sense they only have so much information. At some point overfitting is likely to manifest not in test risk per se, but in random variations over alternate test samples. Eg overfitting would evidence in susceptibility to adversarial regimes not cross validation risk.I've always been skeptical of cross validation based inference though and admit it's a fascinating phenomenon in the paper.It just seems, informationally speaking, to be proposing something akin to free energy: that more data is worse and if you just increase your model complexity you can magically infer truth. It seems more likely to be an error in the inferential paradigm.

ganzuul超过 5 年前

This observation seems to come up in discussions regarding the hot topic of the <a href="https://en.wikipedia.org/wiki/Neural_tangent_kernel" rel="nofollow">https://en.wikipedia.org/wiki/Neural_tangent_kernel</a>Please correct me if I'm wrong, but I think it means to say that you can in theory conjure specialized kernel methods out of 'infinitely' over-parametrized neural networks. At the moment this all gives unimpressive performance, but it is theoretically promising and could give statisticians interpretable NN-derived models.

_0ffh超过 5 年前

Wow, that implies that given a large enough model, early stopping is actually a mistake! Who’d a thunk it?

评论 #21733945 未加载

dzdt超过 5 年前

This is a really good read! I am not deep in machine learning research but the exposition and diagrams make the point clearly. I really feel like this us advancing deep learning as a science.

eutectic超过 5 年前

What happens if you take a large model and regularize it to have the same training error as a smaller model? Do you get the same benefits?

8 条评论

jackson1372超过 5 年前

评论 #21731005 未加载

评论 #21735499 未加载

liaukovv超过 5 年前

"Understanding" part seems rather optimistic.

lostmsu超过 5 年前

Can someone explain what is "interpolation threshold"? Both articles talk about it, but none defines.

评论 #21732261 未加载

评论 #21732137 未加载

gyuserbti超过 5 年前

ganzuul超过 5 年前

_0ffh超过 5 年前

Wow, that implies that given a large enough model, early stopping is actually a mistake! Who’d a thunk it?

评论 #21733945 未加载

dzdt超过 5 年前

This is a really good read! I am not deep in machine learning research but the exposition and diagrams make the point clearly. I really feel like this us advancing deep learning as a science.

eutectic超过 5 年前

What happens if you take a large model and regularize it to have the same training error as a smaller model? Do you get the same benefits?

Understanding “Deep Double Descent”

8 条评论

Understanding “Deep Double Descent”

8 条评论