TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Machine Learning Showdown: Apache Mahout vs. Weka

54 点作者 doppenhe超过 10 年前

14 条评论

jackhammer超过 10 年前
Most data scientist these days use scikit-learn or R. Weka is really out of fashion. Mahout and mllib are difficult to use and perform less. Often it's better to just down-sample or rent an EC2 instance with a lot of memory.
评论 #8637570 未加载
discardorama超过 10 年前
This is almost apples and oranges. Mahout&#x27;s power lies in it&#x27;s ability to handle huge amounts of data, in a parallel fashion. Weka (which is rarely used these days anyways) is for smaller problems and experimentation.<p>None of these (Mahout and Weka) are mainstream anymore. For large-scale classification, people are using packages like VW[1] . And for small-scale experimentation, SciKit or R.<p>[1] <a href="http://hunch.net/~vw" rel="nofollow">http:&#x2F;&#x2F;hunch.net&#x2F;~vw</a>
评论 #8637596 未加载
bayonetz超过 10 年前
Rapidminer is the jam for prototyping ML processes. It&#x27;s so powerfully useful I&#x27;ve always been surprised they&#x27;ve kept it free for so long (they have a pay version but it&#x27;s not necessary). In addition to its own algorithms, it has a plugin that wraps Weka so you get all those if you want them too. I&#x27;m I n no way connected with them, just a big fan of it over every other ML library or tool I&#x27;ve seen. If I could buy stock I would...
doppenhe超过 10 年前
for our HN friends direct invite <a href="https://algorithmia.com/signup?invite=HN24hr" rel="nofollow">https:&#x2F;&#x2F;algorithmia.com&#x2F;signup?invite=HN24hr</a>
评论 #8636633 未加载
评论 #8636748 未加载
akbar501超过 10 年前
For ML, Spark MLlib is a solid choice.<p>For large scale, distributed stats I&#x27;d go with SparkR.<p><a href="https://spark.apache.org/" rel="nofollow">https:&#x2F;&#x2F;spark.apache.org&#x2F;</a>
评论 #8637244 未加载
folli超过 10 年前
I&#x27;m not very experienced in Machine learning, just dabbled around a bit, so maybe someone could explain me this:<p>Looking at the graph number of trees vs accuracy, I would have expected that the line would asymptotically reach a maximum accuracy given more and more trees; however for weka it looks quite wavy and for mahout it even looks as if there&#x27;s an optimum and more trees are worse.<p>Or is it just noise and I&#x27;m interpreting too much?
评论 #8637781 未加载
dthal超过 10 年前
There is something bothering me about this...Weka&#x27;s accuracy seems quite high in comparison to the results at Yann LeCun&#x27;s MNIST page [1]. Its hard for me to believe that &quot;the answer&quot; to the MNIST problem is &quot;use WEKA&#x27;s RF&quot;. [1]<a href="http://yann.lecun.com/exdb/mnist/" rel="nofollow">http:&#x2F;&#x2F;yann.lecun.com&#x2F;exdb&#x2F;mnist&#x2F;</a>
yid超过 10 年前
Ugh... comparing random algorithms without showings error bounds on the accuracies.
评论 #8638168 未加载
sgwizdak超过 10 年前
I&#x27;m surprised that Spark&#x27;s mllib wasn&#x27;t included in this comparison.
评论 #8636439 未加载
tsewlliw超过 10 年前
Just playing around with it, do typical strategies for using these tools include &quot;bad&quot; data? I drew a &#x27;-&#x27; and got &#x27;4&#x27; as the guess, which feels very wrong.
评论 #8636970 未加载
spountzy超过 10 年前
WEKA ist really slow, at least when trying out the &#x27;example&#x27;... And why choosing these two? There are a lot more. But anyway, thx for the comparison
评论 #8637101 未加载
therobot24超过 10 年前
WEKA seems to take forever to classify a digit for their demo, also i wonder why there are various drops in performance when using 200 and 300 trees
评论 #8637030 未加载
mch82超过 10 年前
Can anyone summarize the general workflow for using these analysis tools? Just looking for a high level intro and maybe a link to more detail.
coffeemugmugmug超过 10 年前
I don&#x27;t know anybody seriously using either of these. Mahout has bad implementations and Weka is showing it&#x27;s age.