TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Understanding “Deep Double Descent”

108 点作者 alexcnwy超过 5 年前

8 条评论

jackson1372超过 5 年前
The reason you want to over-parameterize your model is that it protects you from &quot;bad bounce&quot; learning trajectories. You effectively spread out your overfitting risk until it&#x27;s pretty close to 0.<p>Or at least that&#x27;s the way I like to think of it.<p>The next step is to better compress the resulting model in a simpler, less computationally costly network.
评论 #21731005 未加载
评论 #21735499 未加载
liaukovv超过 5 年前
&quot;Understanding&quot; part seems rather optimistic.
lostmsu超过 5 年前
Can someone explain what is &quot;interpolation threshold&quot;? Both articles talk about it, but none defines.
评论 #21732261 未加载
评论 #21732137 未加载
gyuserbti超过 5 年前
My initial intuition is there&#x27;s limitations in test samples that are used, in the sense they only have so much information. At some point overfitting is likely to manifest not in test risk per se, but in random variations over alternate test samples. Eg overfitting would evidence in susceptibility to adversarial regimes not cross validation risk.<p>I&#x27;ve always been skeptical of cross validation based inference though and admit it&#x27;s a fascinating phenomenon in the paper.<p>It just seems, informationally speaking, to be proposing something akin to free energy: that more data is worse and if you just increase your model complexity you can magically infer truth. It seems more likely to be an error in the inferential paradigm.
ganzuul超过 5 年前
This observation seems to come up in discussions regarding the hot topic of the <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Neural_tangent_kernel" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Neural_tangent_kernel</a><p>Please correct me if I&#x27;m wrong, but I think it means to say that you can in theory conjure specialized kernel methods out of &#x27;infinitely&#x27; over-parametrized neural networks. At the moment this all gives unimpressive performance, but it is theoretically promising and could give statisticians interpretable NN-derived models.
_0ffh超过 5 年前
Wow, that implies that given a large enough model, early stopping is actually a mistake! Who’d a thunk it?
评论 #21733945 未加载
dzdt超过 5 年前
This is a really good read! I am not deep in machine learning research but the exposition and diagrams make the point clearly. I really feel like this us advancing deep learning as a science.
eutectic超过 5 年前
What happens if you take a large model and regularize it to have the same training error as a smaller model? Do you get the same benefits?