TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Benchmarking Random Forest Classification

25 点作者 jsbloom1将近 12 年前

3 条评论

tlarkworthy将近 12 年前
Its random forests ... each tree is trained on a <i></i>subset<i></i> of the data. You can split the massive dataset into chunks and train independently. That sidesteps the &quot;big data&quot; hangup.<p>If you look at the implementation for ski-learn, each tree emits a normalised probability vector for each prediction, those vectors are simply multiplied together to get the aggregate prediction, so its not very difficult to do yourself.<p>Although regardless, you are applying a batch learning technique anyway. You want an incremental learner for big data.
评论 #6050164 未加载
评论 #6050163 未加载
glouppe将近 12 年前
Any chance for you to run your benchmarks on this branch of Scikit-Learn? <a href="https://github.com/glouppe/scikit-learn/tree/trees-v2" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;glouppe&#x2F;scikit-learn&#x2F;tree&#x2F;trees-v2</a> This will be shipped anytime soon :)<p>We have been working hard to reduce computing times and memory footprint (though, there is still a lot of improvement on that side).<p>(Unfortunately, I cannot run your benchmarks myself, because the compiled version of WiseRF requires a newer version of glibc than the one on my cluster, and crashes.)
bravura将近 12 年前
Question: Why do I have to implement hyperparameter selection?<p>For me, the promise of in-the-cloud machine learning is that I can call &#x27;train&#x27; method, and specify one single hyperparameter: training budget (i.e. $). Perhaps also the max time before I am returned a trained model.<p>That&#x27;s it. Can you do that?
评论 #6050284 未加载
评论 #6050159 未加载