TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Stacked Approximated Regression Machine: A Simple Deep Learning Approach

123 pointsby kartikkumarover 8 years ago

4 comments

imurrayover 8 years ago
I&#x27;m going to check back later to see if anyone manages to reproduce it. Perhaps by the time it&#x27;s presented at NIPS.<p>A twitter conversation reflecting some scepticism, but agreeing it would be interesting if it all checks out: <a href="https:&#x2F;&#x2F;twitter.com&#x2F;fchollet&#x2F;status&#x2F;771862837819867136" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;fchollet&#x2F;status&#x2F;771862837819867136</a>
评论 #12431304 未加载
nlover 8 years ago
Update on this: it has been withdrawn: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1608.04062" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1608.04062</a>
cs702over 8 years ago
As far as I understand this, these guys claim they can train convolutional and many other types of deep neural nets faster by pretraining each layer with a new unsupervised technique via which the layer sort of learns to compress its inputs (a local optimization problem), and then they fine tune the whole network end-to-end with supervised SGD and backpropagation as usual. They have not released code, so no one else has replicated this yet -- as far as I know.<p>If the claim holds, the implication is that layers can <i>quickly</i> learn much of what they need to learn <i>locally</i>, that is, without requiring backpropagation of gradients from potentially very distant layers. I can&#x27;t help but wonder if this opens the door for more efficient asynchronous&#x2F;parallel&#x2F;distributed training of layers, potentially leading to models that update themselves continuously (i.e., &quot;online&quot; instead of in a batch process).<p>I wouldn&#x27;t be surprised if the claim holds. There is mounting evidence that standard end-to-end backpropagation is a rather inefficient learning mechanism. For example, we now know that deep neural nets can be trained with <i>approximate gradients</i> obtained by shifting bits to get the sign and order of magnitude of the gradient roughly right.[1] In some cases it&#x27;s even possible to restrict learning to use binary weights.[2] More recently, we have learned that it&#x27;s possible to use &quot;helper&quot; linear models during training <i>to predict what the gradients will be</i> for each layer, in-between true-gradient updates, allowing layers to update their parameters locally during backpropagation.[3] Finally, don&#x27;t forget that in the late 2000&#x27;s, AI researchers were doing a lot of interesting work with unsupervised layer-wise training (e.g., DBNs composed of RBMs, stacked autoencoders).[4]<p>This is a fascinating area of research with potentially huge payoffs. For example, it would be really neat if we find there&#x27;s a &quot;general&quot; algorithm via which layers can learn locally from inputs continuously (&quot;online&quot;), allowing us to combine layers into deep neural nets for specific tasks as needed.<p>[1] <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1510.03009" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1510.03009</a><p>[2] <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1602.02830" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1602.02830</a><p>[3] <a href="https:&#x2F;&#x2F;deepmind.com&#x2F;blog#decoupled-neural-interfaces-using-synthetic-gradients" rel="nofollow">https:&#x2F;&#x2F;deepmind.com&#x2F;blog#decoupled-neural-interfaces-using-...</a><p>[4] <a href="https:&#x2F;&#x2F;www.iro.umontreal.ca&#x2F;~lisa&#x2F;pointeurs&#x2F;TR1312.pdf" rel="nofollow">https:&#x2F;&#x2F;www.iro.umontreal.ca&#x2F;~lisa&#x2F;pointeurs&#x2F;TR1312.pdf</a><p>EDITS: Expanded the original comment so it conveys better what I actually meant to write, while keeping language as casual and informal as possible. Also, I softened the tone of my more speculative observations.
评论 #12459770 未加载
评论 #12433160 未加载
评论 #12436949 未加载
评论 #12431375 未加载
billconanover 8 years ago
is there a book you can recommend about the fundamentals (like sparse coding) to understand papers like this?
评论 #12431494 未加载
评论 #12440576 未加载
评论 #12432143 未加载