TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Deep Forest: Towards an Alternative to Deep Neural Networks

350 pointsby kerckerabout 8 years ago

15 comments

rkaplanabout 8 years ago
&quot;In contrast to deep neural networks which require great effort in hyper-parameter tuning, gcForest is much easier to train.&quot;<p>Hyperparameter tuning is not as much of an issue with deep neural networks anymore. Thanks to BatchNorm and more robust optimization algorithms, most of the time you can simply use Adam with a default learning rate of 0.001 and do pretty well. Dropout is not even necessary with many models that use BatchNorm nowadays, so generally tuning there is not an issue either. Many layers of 3x3 conv with stride 1 is still magical.<p>Basically: deep NNs can work pretty well with little to no tuning these days. The defaults just work.
评论 #13773601 未加载
评论 #13774451 未加载
评论 #13773652 未加载
评论 #13777289 未加载
评论 #13775563 未加载
评论 #13774452 未加载
throw_away_777about 8 years ago
I&#x27;ve always found it curious that Neural Networks get so much hype when xgboost (gradient boosted decision trees) is by far the most popular and accurate algorithm for most Kaggle competitions. While neural networks are better for image processing types of problems, there are a wide variety of machine learning problems where decision tree methods perform better and are much easier to implement.
评论 #13777148 未加载
评论 #13775438 未加载
评论 #13775541 未加载
评论 #13775088 未加载
评论 #13777985 未加载
评论 #13775059 未加载
jreabout 8 years ago
I don&#x27;t know about the others, but the two visions dataset they compare to (MNIST and the face recognition one) are small datasets and the CNN they compare to doesn&#x27;t seem very state of the art.<p>It also seems each layer of random forest just concatenates a class distribution to the original feature vector. So this doesn&#x27;t seem to get the same &quot;hierarchy of features&quot; benefit that you get in large-scale CNN and DNN.
评论 #13773331 未加载
评论 #13773310 未加载
评论 #13773391 未加载
FrozenVoidabout 8 years ago
Related: Deep neural decision forests (ConvNets+Random Forests) <a href="http:&#x2F;&#x2F;www.cv-foundation.org&#x2F;openaccess&#x2F;content_iccv_2015&#x2F;papers&#x2F;Kontschieder_Deep_Neural_Decision_ICCV_2015_paper.pdf" rel="nofollow">http:&#x2F;&#x2F;www.cv-foundation.org&#x2F;openaccess&#x2F;content_iccv_2015&#x2F;pa...</a> <a href="http:&#x2F;&#x2F;matthewalunbrown.com&#x2F;papers&#x2F;1603.01250v1.pdf" rel="nofollow">http:&#x2F;&#x2F;matthewalunbrown.com&#x2F;papers&#x2F;1603.01250v1.pdf</a> <a href="https:&#x2F;&#x2F;topos-theory.github.io&#x2F;deep-neural-decision-forests&#x2F;" rel="nofollow">https:&#x2F;&#x2F;topos-theory.github.io&#x2F;deep-neural-decision-forests&#x2F;</a>
ungzdabout 8 years ago
Was about to joke about Deep Support Vector Machines, but found out they exist too: <a href="https:&#x2F;&#x2F;www.esat.kuleuven.be&#x2F;sista&#x2F;ROKS2013&#x2F;files&#x2F;presentations&#x2F;DSVM_ROKS_2013_WIERING.pdf" rel="nofollow">https:&#x2F;&#x2F;www.esat.kuleuven.be&#x2F;sista&#x2F;ROKS2013&#x2F;files&#x2F;presentati...</a> <a href="http:&#x2F;&#x2F;deeplearning.net&#x2F;wp-content&#x2F;uploads&#x2F;2013&#x2F;03&#x2F;dlsvm.pdf" rel="nofollow">http:&#x2F;&#x2F;deeplearning.net&#x2F;wp-content&#x2F;uploads&#x2F;2013&#x2F;03&#x2F;dlsvm.pdf</a>
评论 #13776137 未加载
paulsutterabout 8 years ago
No Free Lunch theorem refesher:<p>&quot;if an algorithm performs well on a certain class of problems then it necessarily pays for that with degraded performance on the set of all remaining problems&quot;<p><a href="https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;No_free_lunch_theorem" rel="nofollow">https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;No_free_lunch_theorem</a>
评论 #13773892 未加载
评论 #13774804 未加载
评论 #13774201 未加载
评论 #13773830 未加载
DanielleMolloyabout 8 years ago
So if this works well why is there no comparison on ImageNet?
评论 #13777220 未加载
评论 #13773974 未加载
评论 #13773365 未加载
评论 #13773301 未加载
KirinDaveabout 8 years ago
While optimizations to cost by ditching GPUs as a requirement are important (and presumably these systems benefit from GPU optimization as well, seems unclear from my skim of the paper), cheaper training is NOT just about saving your wallet.<p>A real emerging area of opportunity is having systems train new systems. This has numerous applications, including assisting DSEs in the construction of new systems or allowing expert systems to learn more over time and even integrate new techniques into a currently deployed system.<p>I&#x27;n not an expert here, but I&#x27;d like to be, so I&#x27;m definitely going to ask my expert friends more about this.
Dim25about 8 years ago
Please note the CPUs they have used are pretty advanced: 2x Intel E5 2670 v3 CPU (24 cores) - approx. price $1.5k per unit (<a href="http:&#x2F;&#x2F;ark.intel.com&#x2F;products&#x2F;81709&#x2F;Intel-Xeon-Processor-E5-2670-v3-30M-Cache-2_30-GHz" rel="nofollow">http:&#x2F;&#x2F;ark.intel.com&#x2F;products&#x2F;81709&#x2F;Intel-Xeon-Processor-E5-...</a>).<p>Looking forward to try the code (especially on CIFAR or ImageNet), Zhi-Hua Zhou, one of the authors, said they are going to publish it soon.
throwaway312780about 8 years ago
XGBoost also appears to have a GPU implementation.
评论 #13773690 未加载
edixonabout 8 years ago
None of these experiments actually do anything to show feature learning - if this is the claim, I would like to see a transfer learning experiment. I would be surprised if this works well, since they can&#x27;t jointly optimize their layers (so you can&#x27;t just use ImageNet to induce a good representation). Not quite clear why we should think that trees will turn out to be inherently cheaper that a DNN with similar accuracy, unless perhaps the model structure encodes a prior which matches the distribution of the problem?
argonautabout 8 years ago
The method&#x27;s performance on MNIST is relatively mediocre. You might think 98.96% is amazing, but it&#x27;s about relative performance. It is a relatively easy exercise nowadays to get above 99% with neural nets. Even I can get that kind of performance with hand-written Python neural nets, on the CPU, with no convolutions.<p>For the rest of the (non-image) datasets, it&#x27;s already common knowledge that boosting methods are competitive with neural nets.
uptownfunkabout 8 years ago
Would like to see three things coming out of this<p>1. R code implementation (could probably write this myself but would make things easier)<p>2. How to get feature importance? Otherwise difficult to implement in business context.<p>3. Better benchmarks
DrNukeabout 8 years ago
Progress in this field is astonishing and it really propagates to the masses in the form of easy-to-use black boxes with a pinch of undergraduate-level maths. Just wow!
bamboozledabout 8 years ago
HN just won&#x27;t be the same