TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Tips for Better Deep Learning Models

56 点作者 lauradhamilton将近 11 年前

6 条评论

gamegoblin将近 11 年前
A note on dropout:<p>If your layer size is relatively small (not hundreds or thousands of nodes), dropout is usually detrimental and a more traditional regularization method such as weight-decay is superior.<p>For the size networks Hinton et al are playing with nowadays (with thousands of nodes in a layer), dropout is good, though.
评论 #8055169 未加载
vundervul将近 11 年前
Who is Arno Candel and why should we pay attention to his tips on training neural networks? Anyone who suggests grid search for metaparameter tuning is out of touch with the consensus among experts in deep learning. A lot of people are coming out of the woodwork and presenting themselves as experts in this exciting area because it has had so much success recently, but most of them seem to be beginners. Having lots of beginners learning is fine and healthy, but a lot of these people act as if they are experts.
评论 #8056129 未加载
agibsonccc将近 11 年前
I would just like to link to my comments from before for people who maybe curious:<p><a href="https://news.ycombinator.com/item?id=7803101" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=7803101</a><p>I will also add that looking in to hessian free for training over conjugate gradient&#x2F;LBFGS&#x2F;SGD for feed forward nets has proven to be amazing[1].<p>Recursive nets I&#x27;m still playing with yet, but based on the work by socher, they used LBFGS just fine.<p>[1]: <a href="http://www.cs.toronto.edu/~rkiros/papers/shf13.pdf" rel="nofollow">http:&#x2F;&#x2F;www.cs.toronto.edu&#x2F;~rkiros&#x2F;papers&#x2F;shf13.pdf</a><p>[2]: <a href="http://socher.org/" rel="nofollow">http:&#x2F;&#x2F;socher.org&#x2F;</a>
prajit将近 11 年前
A question about the actual slides: why don&#x27;t they use unsupervised pretraining (i.e. Sparse Autoencoder) for predicting MNIST? Is it just to show that they don&#x27;t need pretraining to achieve good results or is there something deeper?
评论 #8056088 未加载
TrainedMonkey将近 11 年前
Direct link to slides: <a href="http://www.slideshare.net/0xdata/h2o-distributed-deep-learning-by-arno-candel-071614" rel="nofollow">http:&#x2F;&#x2F;www.slideshare.net&#x2F;0xdata&#x2F;h2o-distributed-deep-learni...</a>
ivan_ah将近 11 年前
direct link to slides anyone?
评论 #8054858 未加载