科技回声

14 条评论

jackhammer超过 10 年前

Most data scientist these days use scikit-learn or R. Weka is really out of fashion. Mahout and mllib are difficult to use and perform less. Often it's better to just down-sample or rent an EC2 instance with a lot of memory.

评论 #8637570 未加载

discardorama超过 10 年前

This is almost apples and oranges. Mahout's power lies in it's ability to handle huge amounts of data, in a parallel fashion. Weka (which is rarely used these days anyways) is for smaller problems and experimentation.None of these (Mahout and Weka) are mainstream anymore. For large-scale classification, people are using packages like VW[1] . And for small-scale experimentation, SciKit or R.[1] <a href="http://hunch.net/~vw" rel="nofollow">http://hunch.net/~vw</a>

评论 #8637596 未加载

bayonetz超过 10 年前

Rapidminer is the jam for prototyping ML processes. It's so powerfully useful I've always been surprised they've kept it free for so long (they have a pay version but it's not necessary). In addition to its own algorithms, it has a plugin that wraps Weka so you get all those if you want them too. I'm I n no way connected with them, just a big fan of it over every other ML library or tool I've seen. If I could buy stock I would...

doppenhe超过 10 年前

for our HN friends direct invite <a href="https://algorithmia.com/signup?invite=HN24hr" rel="nofollow">https://algorithmia.com/signup?invite=HN24hr</a>

评论 #8636633 未加载

评论 #8636748 未加载

akbar501超过 10 年前

For ML, Spark MLlib is a solid choice.For large scale, distributed stats I'd go with SparkR.<a href="https://spark.apache.org/" rel="nofollow">https://spark.apache.org/</a>

评论 #8637244 未加载

folli超过 10 年前

I'm not very experienced in Machine learning, just dabbled around a bit, so maybe someone could explain me this:Looking at the graph number of trees vs accuracy, I would have expected that the line would asymptotically reach a maximum accuracy given more and more trees; however for weka it looks quite wavy and for mahout it even looks as if there's an optimum and more trees are worse.Or is it just noise and I'm interpreting too much?

评论 #8637781 未加载

dthal超过 10 年前

There is something bothering me about this...Weka's accuracy seems quite high in comparison to the results at Yann LeCun's MNIST page [1]. Its hard for me to believe that "the answer" to the MNIST problem is "use WEKA's RF". [1]<a href="http://yann.lecun.com/exdb/mnist/" rel="nofollow">http://yann.lecun.com/exdb/mnist/</a>

yid超过 10 年前

Ugh... comparing random algorithms without showings error bounds on the accuracies.

评论 #8638168 未加载

sgwizdak超过 10 年前

I'm surprised that Spark's mllib wasn't included in this comparison.

评论 #8636439 未加载

tsewlliw超过 10 年前

Just playing around with it, do typical strategies for using these tools include "bad" data? I drew a '-' and got '4' as the guess, which feels very wrong.

评论 #8636970 未加载

spountzy超过 10 年前

WEKA ist really slow, at least when trying out the 'example'... And why choosing these two? There are a lot more. But anyway, thx for the comparison

评论 #8637101 未加载

therobot24超过 10 年前

WEKA seems to take forever to classify a digit for their demo, also i wonder why there are various drops in performance when using 200 and 300 trees

评论 #8637030 未加载

mch82超过 10 年前

Can anyone summarize the general workflow for using these analysis tools? Just looking for a high level intro and maybe a link to more detail.

coffeemugmugmug超过 10 年前

I don't know anybody seriously using either of these. Mahout has bad implementations and Weka is showing it's age.

14 条评论

jackhammer超过 10 年前

评论 #8637570 未加载

discardorama超过 10 年前

评论 #8637596 未加载

bayonetz超过 10 年前

doppenhe超过 10 年前

for our HN friends direct invite <a href="https://algorithmia.com/signup?invite=HN24hr" rel="nofollow">https://algorithmia.com/signup?invite=HN24hr</a>

评论 #8636633 未加载

评论 #8636748 未加载

akbar501超过 10 年前

For ML, Spark MLlib is a solid choice.For large scale, distributed stats I'd go with SparkR.<a href="https://spark.apache.org/" rel="nofollow">https://spark.apache.org/</a>

评论 #8637244 未加载

folli超过 10 年前

评论 #8637781 未加载

dthal超过 10 年前

yid超过 10 年前

Ugh... comparing random algorithms without showings error bounds on the accuracies.

评论 #8638168 未加载

sgwizdak超过 10 年前

I'm surprised that Spark's mllib wasn't included in this comparison.

评论 #8636439 未加载

tsewlliw超过 10 年前

Just playing around with it, do typical strategies for using these tools include "bad" data? I drew a '-' and got '4' as the guess, which feels very wrong.

评论 #8636970 未加载

spountzy超过 10 年前

WEKA ist really slow, at least when trying out the 'example'... And why choosing these two? There are a lot more. But anyway, thx for the comparison

评论 #8637101 未加载

therobot24超过 10 年前

WEKA seems to take forever to classify a digit for their demo, also i wonder why there are various drops in performance when using 200 and 300 trees

评论 #8637030 未加载

mch82超过 10 年前

Can anyone summarize the general workflow for using these analysis tools? Just looking for a high level intro and maybe a link to more detail.

coffeemugmugmug超过 10 年前

I don't know anybody seriously using either of these. Mahout has bad implementations and Weka is showing it's age.

Machine Learning Showdown: Apache Mahout vs. Weka

14 条评论

Machine Learning Showdown: Apache Mahout vs. Weka

14 条评论