TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Deep Networks with Stochastic Depth

70 点作者 nicklo大约 9 年前

5 条评论

nl大约 9 年前
I suspect this was posted because of Delip Rao&#x27;s write-up[1] (which I suggest might be a better link).<p>It&#x27;s a nice - if somewhat controversial - summary.<p>40% speedup on DNN training with state-of-the-art results.<p>[1] <a href="http:&#x2F;&#x2F;deliprao.com&#x2F;archives&#x2F;134" rel="nofollow">http:&#x2F;&#x2F;deliprao.com&#x2F;archives&#x2F;134</a>
sdenton4大约 9 年前
It&#x27;s kind of bonkers that this works. It suggests that the whole belief that layers are learning different representations is completely wrong: if layer three is expecting a certain kind of intermediate representation from layer two, and is then given the raw input, one would expect layer three to choke.<p>Instead, the depth seems to be giving something like a progressive unwinding of the feature space.<p>It would be interesting to compare the trained networks to networks trained in the usual way, to see if they&#x27;re coming up with similar coefficients in spite of the different training methods, out if this is producing something completely different.
评论 #11422500 未加载
radarsat1大约 9 年前
Having not read the paper, something I find unclear: is it only the feedback path that is skipped, or is the feedforward path also skipped? The abstract mentions replacing the layer with an identity function. I&#x27;m not sure how this would work, wouldn&#x27;t it change the result (i.e the encoding used by the following layer would be corrupted) if you just multiply the inputs by 1 and add them?<p>Otherwise, how precisely do you &quot;skip&quot; a layer without corrupting the training of lower layers?<p>Edit: the answer is in the definition of &quot;skip layers&quot;, introduced in a previous paper: <a href="http:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1512.03385" rel="nofollow">http:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1512.03385</a> which introduces identity functions into the layer equation.. I guess I have more reading to do on this topic.
评论 #11422201 未加载
romaniv大约 9 年前
I&#x27;m reading Delip&#x27;s followup post[1] and it reminds me how much of ANN stuff is till pretty much alchemy.<p>[1] <a href="http:&#x2F;&#x2F;deliprao.com&#x2F;archives&#x2F;137" rel="nofollow">http:&#x2F;&#x2F;deliprao.com&#x2F;archives&#x2F;137</a>
karterk大约 9 年前
This is literally one of the most exciting papers I have read recently that will have quite some impact on deep learning models. The major drawback of deep architectures today is training time and any.improvement to that will have a drastic effect on my productivity.<p>Right now I basically run N architectures on N GPUs at the same time to speed things up. And that&#x27;s a luxury.