TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Benchmarking Random Forest Classification

25 pointsby jsbloom1almost 12 years ago

3 comments

tlarkworthyalmost 12 years ago
Its random forests ... each tree is trained on a <i></i>subset<i></i> of the data. You can split the massive dataset into chunks and train independently. That sidesteps the &quot;big data&quot; hangup.<p>If you look at the implementation for ski-learn, each tree emits a normalised probability vector for each prediction, those vectors are simply multiplied together to get the aggregate prediction, so its not very difficult to do yourself.<p>Although regardless, you are applying a batch learning technique anyway. You want an incremental learner for big data.
评论 #6050164 未加载
评论 #6050163 未加载
glouppealmost 12 years ago
Any chance for you to run your benchmarks on this branch of Scikit-Learn? <a href="https://github.com/glouppe/scikit-learn/tree/trees-v2" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;glouppe&#x2F;scikit-learn&#x2F;tree&#x2F;trees-v2</a> This will be shipped anytime soon :)<p>We have been working hard to reduce computing times and memory footprint (though, there is still a lot of improvement on that side).<p>(Unfortunately, I cannot run your benchmarks myself, because the compiled version of WiseRF requires a newer version of glibc than the one on my cluster, and crashes.)
bravuraalmost 12 years ago
Question: Why do I have to implement hyperparameter selection?<p>For me, the promise of in-the-cloud machine learning is that I can call &#x27;train&#x27; method, and specify one single hyperparameter: training budget (i.e. $). Perhaps also the max time before I am returned a trained model.<p>That&#x27;s it. Can you do that?
评论 #6050284 未加载
评论 #6050159 未加载