Stack Exchange Machine Learning Contest

143 点作者 moserware将近 13 年前

9 条评论

If you're someone who's interested in ML/datamining but haven't had a chance to put your ideas to any hard/interesting problems I strongly recommend a kaggle contest. It's one thing to plug some data into a random forest and go "oh cool, I guess that did okay" and entirely another to see how other competitors are comparing.one of the biggest challenges I've found in implementing ML projects is I don't have a great sense of when I've really gotten the most info out of the data. I'm not particularly competitive but the contest format is great for this. When you see that a solution you'd normally be happy with ranks in the lower half of the answer you're really pushed to improve your solution.This is leads you to learn your tools and algorithms better. For a couple of contests I took seriously I ended up learning tons about R, spent most of my nights reading academic papers on various newer techniques, and also read through a few books. On top of all that you really should spend time reading up on how past winner have won which gives a bunch of practical insight into approaching different ML problems.In one contest I tried the hardest in I actually placed terribly after the final results were calculated, but looking over what went wrong I was amazed to see that I actually did progress really far with my understanding of ml. I'd say a month of seriously competing is easily worth a semester long grad class.

评论 #4414163 未加载

评论 #4414230 未加载

rm999将近 13 年前

Looks like a cool contest, I may check it out. What bothers me about modeling contests (I've taken part in several, it's my field) is they often reward putting 90% of your effort into extracting relatively small performance gains. For one thing it's not a realistic operating environment, there are usually many other factors more important than pure performance like upkeep, cost, speed, etc. This is why the netflix contest winning models couldn't go into production. The other issue I have is that people with other commitments (like a job) don't really stand a chance, it's usually very time-consuming to go from fifth place to first.

评论 #4413904 未加载

msellout将近 13 年前

Does any other profession have a Kaggle? Imagine a more general contest: build my company a tool that increases our market value by X%; we'll give the winners $Y and a job interview. The expected value of participating is $Y/n, where n is the number of participants.It's like the opposite of a professional organization. I suppose the libertarians approve. It drives down the cost of labor and therefore might make the market more efficient. Yet I'm suspicious.I'd like to propose a counter-organization. Analysts can band together and offer a contest. We collaborate to create a tool that gives your company an X% increase in value. Companies bid for the rights to that tool. I'd expect that the value to the laborer would be greater than $Y/n. I guess that just described a consulting company.Perhaps the situation is not so unique. Art also provides much value in the act of production and many organizations hold art contests similar in design to Kaggle competitions. Open-source software often doesn't even have a competition sponsor.It'd be ludicrous to imagine holding a contest to offer the best legal advice or diagnosis. I'm not saying that I agree with the restrictions that the American Medical Association has placed over the ability to attend medical school, but the free market is harsh enough competition.Kaggle does promote the value of the field as a whole. I worry that it commoditizes rather than professionalizes.

cletus将近 13 年前

It's nice to see this kind of contest but the topic just sets me off on a much-needed rant.The moderator situation on Stackoverflow is getting out of control. I see a Q&A site as having three main groups:1. People who ask questions;2. People who answer questions; and3. People who edit/moderate questions.Even 2+ years ago there was a lot of lip service paid to the value of (3). I disagreed then and it's only been reaffirmed by subsequent events. To be clear: it's not that I think these functions have no value, it's that they are, at best, secondary to content creation.The problem is that these roles without diligent oversight attract the wrong kinds of people (eg [1] [2] and a scandal a few years about an admin black list that I can't seem to find right now).Take this question from Stackoverflow: Database development mistakes made by application developers [3], a question I spent some time answering and that people seemed to appreciate the answer to (based on comments and 1000+ upvotes). It is closed as "not constructive". This is hardly a unique phenomenon. We've all seen many interesting questions posted here that are now closed or locked and who knows how many have been deleted.The kind of person you end up is overly pedantic and a real stickler for an arbitrary set of rules.Editors/moderators are the bureaucrats of the Internet.As Oscar Wilde said, “The bureaucracy is expanding to meet the needs of the expanding bureaucracy.” [4]. These sorts of people just invent work for themselves in the absence of anything to do.Joel needs to make some changes to Stackoverflow. It's rapidly going the way of the old Usenet days when anything interesting gets shot down and anything else gets closed and the OP lambasted for not having found the 17 previous duplicates. Not good.The biggest problem I see is an extreme interpretation of what is "subjective". "What language should I learn?" is an obviously subjective question. In the absence of any concrete criteria, it's hard to give a useful answer.But consider a question like "What are the pros and cons of Sinatra vs Rails?" This sort of question (IMHO) absolutely has value as someone experienced with both could enumerate the relative merits of each in a pretty objective fashion without making an absolute determination. This is something that absolutely could have value to anyone evaluating Ruby Web frameworks.So, back to this post, what are the odds of any particular question being closed? it seems to be positively correlated with how much time has passed (since SO's inception) and how interesting the question is.[1]: <a href="http://www.nbcnews.com/technology/technolog/wikipedia-admins-face-gauntlet-scrutiny-889502" rel="nofollow">http://www.nbcnews.com/technology/technolog/wikipedia-admins...</a>[2]: <a href="http://www.searchenginepeople.com/blog/most-notorious-wikipedia-scandals.html" rel="nofollow">http://www.searchenginepeople.com/blog/most-notorious-wikipe...</a>[3]: <a href="http://stackoverflow.com/questions/621884/database-development-mistakes-made-by-application-developers" rel="nofollow">http://stackoverflow.com/questions/621884/database-developme...</a>[4]: <a href="http://www.goodreads.com/quotes/130452-the-bureaucracy-is-expanding-to-meet-the-needs-of-the" rel="nofollow">http://www.goodreads.com/quotes/130452-the-bureaucracy-is-ex...</a>

评论 #4413957 未加载

评论 #4414500 未加载

评论 #4413777 未加载

评论 #4415896 未加载

评论 #4417886 未加载

评论 #4414361 未加载

评论 #4413939 未加载

ricardobeat将近 13 年前

<pre><code> probabilityOfClosing = (question) -> text = question.text.toLowerCase() return (text.length / (text.indexOf('jquery') + 2)) / 100</code></pre>

评论 #4414870 未加载

impendia将近 13 年前

Amusing side note: I clicked on their job ad, apparently they score 10/12 on the "Joel Test", which according to the link indicates "serious problems".

finnw将近 13 年前

Sounds easier than winning the Loebner Prize, and yet there is more cash on offer.I just hope the winning entry will prompt the developers to remove that stupid filter[1] that prevents you from referring to the Halting Problem in question titles.[1]: <a href="http://meta.stackoverflow.com/questions/107989/using-the-word-problem-in-titles" rel="nofollow">http://meta.stackoverflow.com/questions/107989/using-the-wor...</a>

dotborg2将近 13 年前

All those people, who helped in generating data for machine learning algos, now may feel fooled, like suckers.

drudru11将近 13 年前

why is the prize so small when the economic benefit could be much larger?