TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Tips for Better Deep Learning Models

56 pointsby lauradhamiltonalmost 11 years ago

6 comments

gamegoblinalmost 11 years ago
A note on dropout:<p>If your layer size is relatively small (not hundreds or thousands of nodes), dropout is usually detrimental and a more traditional regularization method such as weight-decay is superior.<p>For the size networks Hinton et al are playing with nowadays (with thousands of nodes in a layer), dropout is good, though.
评论 #8055169 未加载
vundervulalmost 11 years ago
Who is Arno Candel and why should we pay attention to his tips on training neural networks? Anyone who suggests grid search for metaparameter tuning is out of touch with the consensus among experts in deep learning. A lot of people are coming out of the woodwork and presenting themselves as experts in this exciting area because it has had so much success recently, but most of them seem to be beginners. Having lots of beginners learning is fine and healthy, but a lot of these people act as if they are experts.
评论 #8056129 未加载
agibsoncccalmost 11 years ago
I would just like to link to my comments from before for people who maybe curious:<p><a href="https://news.ycombinator.com/item?id=7803101" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=7803101</a><p>I will also add that looking in to hessian free for training over conjugate gradient&#x2F;LBFGS&#x2F;SGD for feed forward nets has proven to be amazing[1].<p>Recursive nets I&#x27;m still playing with yet, but based on the work by socher, they used LBFGS just fine.<p>[1]: <a href="http://www.cs.toronto.edu/~rkiros/papers/shf13.pdf" rel="nofollow">http:&#x2F;&#x2F;www.cs.toronto.edu&#x2F;~rkiros&#x2F;papers&#x2F;shf13.pdf</a><p>[2]: <a href="http://socher.org/" rel="nofollow">http:&#x2F;&#x2F;socher.org&#x2F;</a>
prajitalmost 11 years ago
A question about the actual slides: why don&#x27;t they use unsupervised pretraining (i.e. Sparse Autoencoder) for predicting MNIST? Is it just to show that they don&#x27;t need pretraining to achieve good results or is there something deeper?
评论 #8056088 未加载
TrainedMonkeyalmost 11 years ago
Direct link to slides: <a href="http://www.slideshare.net/0xdata/h2o-distributed-deep-learning-by-arno-candel-071614" rel="nofollow">http:&#x2F;&#x2F;www.slideshare.net&#x2F;0xdata&#x2F;h2o-distributed-deep-learni...</a>
ivan_ahalmost 11 years ago
direct link to slides anyone?
评论 #8054858 未加载