科技回声

5 条评论

nl大约 9 年前

I suspect this was posted because of Delip Rao's write-up[1] (which I suggest might be a better link).It's a nice - if somewhat controversial - summary.40% speedup on DNN training with state-of-the-art results.[1] <a href="http://deliprao.com/archives/134" rel="nofollow">http://deliprao.com/archives/134</a>

sdenton4大约 9 年前

It's kind of bonkers that this works. It suggests that the whole belief that layers are learning different representations is completely wrong: if layer three is expecting a certain kind of intermediate representation from layer two, and is then given the raw input, one would expect layer three to choke.Instead, the depth seems to be giving something like a progressive unwinding of the feature space.It would be interesting to compare the trained networks to networks trained in the usual way, to see if they're coming up with similar coefficients in spite of the different training methods, out if this is producing something completely different.

评论 #11422500 未加载

radarsat1大约 9 年前

Having not read the paper, something I find unclear: is it only the feedback path that is skipped, or is the feedforward path also skipped? The abstract mentions replacing the layer with an identity function. I'm not sure how this would work, wouldn't it change the result (i.e the encoding used by the following layer would be corrupted) if you just multiply the inputs by 1 and add them?Otherwise, how precisely do you "skip" a layer without corrupting the training of lower layers?Edit: the answer is in the definition of "skip layers", introduced in a previous paper: <a href="http://arxiv.org/abs/1512.03385" rel="nofollow">http://arxiv.org/abs/1512.03385</a> which introduces identity functions into the layer equation.. I guess I have more reading to do on this topic.

评论 #11422201 未加载

romaniv大约 9 年前

I'm reading Delip's followup post[1] and it reminds me how much of ANN stuff is till pretty much alchemy.[1] <a href="http://deliprao.com/archives/137" rel="nofollow">http://deliprao.com/archives/137</a>

karterk大约 9 年前

This is literally one of the most exciting papers I have read recently that will have quite some impact on deep learning models. The major drawback of deep architectures today is training time and any.improvement to that will have a drastic effect on my productivity.Right now I basically run N architectures on N GPUs at the same time to speed things up. And that's a luxury.

5 条评论

nl大约 9 年前

sdenton4大约 9 年前

评论 #11422500 未加载

radarsat1大约 9 年前

评论 #11422201 未加载

romaniv大约 9 年前

karterk大约 9 年前

Deep Networks with Stochastic Depth

5 条评论

Deep Networks with Stochastic Depth

5 条评论