TechEcho

4 comments

imurrayover 8 years ago

I'm going to check back later to see if anyone manages to reproduce it. Perhaps by the time it's presented at NIPS.A twitter conversation reflecting some scepticism, but agreeing it would be interesting if it all checks out: <a href="https://twitter.com/fchollet/status/771862837819867136" rel="nofollow">https://twitter.com/fchollet/status/771862837819867136</a>

评论 #12431304 未加载

nlover 8 years ago

Update on this: it has been withdrawn: <a href="https://arxiv.org/abs/1608.04062" rel="nofollow">https://arxiv.org/abs/1608.04062</a>

cs702over 8 years ago

As far as I understand this, these guys claim they can train convolutional and many other types of deep neural nets faster by pretraining each layer with a new unsupervised technique via which the layer sort of learns to compress its inputs (a local optimization problem), and then they fine tune the whole network end-to-end with supervised SGD and backpropagation as usual. They have not released code, so no one else has replicated this yet -- as far as I know.If the claim holds, the implication is that layers can quickly learn much of what they need to learn locally, that is, without requiring backpropagation of gradients from potentially very distant layers. I can't help but wonder if this opens the door for more efficient asynchronous/parallel/distributed training of layers, potentially leading to models that update themselves continuously (i.e., "online" instead of in a batch process).I wouldn't be surprised if the claim holds. There is mounting evidence that standard end-to-end backpropagation is a rather inefficient learning mechanism. For example, we now know that deep neural nets can be trained with approximate gradients obtained by shifting bits to get the sign and order of magnitude of the gradient roughly right.[1] In some cases it's even possible to restrict learning to use binary weights.[2] More recently, we have learned that it's possible to use "helper" linear models during training to predict what the gradients will be for each layer, in-between true-gradient updates, allowing layers to update their parameters locally during backpropagation.[3] Finally, don't forget that in the late 2000's, AI researchers were doing a lot of interesting work with unsupervised layer-wise training (e.g., DBNs composed of RBMs, stacked autoencoders).[4]This is a fascinating area of research with potentially huge payoffs. For example, it would be really neat if we find there's a "general" algorithm via which layers can learn locally from inputs continuously ("online"), allowing us to combine layers into deep neural nets for specific tasks as needed.[1] <a href="https://arxiv.org/abs/1510.03009" rel="nofollow">https://arxiv.org/abs/1510.03009</a>[2] <a href="https://arxiv.org/abs/1602.02830" rel="nofollow">https://arxiv.org/abs/1602.02830</a>[3] <a href="https://deepmind.com/blog#decoupled-neural-interfaces-using-synthetic-gradients" rel="nofollow">https://deepmind.com/blog#decoupled-neural-interfaces-using-...</a>[4] <a href="https://www.iro.umontreal.ca/~lisa/pointeurs/TR1312.pdf" rel="nofollow">https://www.iro.umontreal.ca/~lisa/pointeurs/TR1312.pdf</a>EDITS: Expanded the original comment so it conveys better what I actually meant to write, while keeping language as casual and informal as possible. Also, I softened the tone of my more speculative observations.

评论 #12459770 未加载

评论 #12433160 未加载

评论 #12436949 未加载

评论 #12431375 未加载

billconanover 8 years ago

is there a book you can recommend about the fundamentals (like sparse coding) to understand papers like this?

评论 #12431494 未加载

评论 #12440576 未加载

评论 #12432143 未加载

4 comments

imurrayover 8 years ago

评论 #12431304 未加载

nlover 8 years ago

Update on this: it has been withdrawn: <a href="https://arxiv.org/abs/1608.04062" rel="nofollow">https://arxiv.org/abs/1608.04062</a>

cs702over 8 years ago

评论 #12459770 未加载

评论 #12433160 未加载

评论 #12436949 未加载

评论 #12431375 未加载

billconanover 8 years ago

is there a book you can recommend about the fundamentals (like sparse coding) to understand papers like this?

评论 #12431494 未加载

评论 #12440576 未加载

评论 #12432143 未加载

Stacked Approximated Regression Machine: A Simple Deep Learning Approach

4 comments

Stacked Approximated Regression Machine: A Simple Deep Learning Approach

4 comments