TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Amateurs beat specialists in data-prediction competitions

56 点作者 ivoflipse超过 12 年前

7 条评论

micro_cam超过 12 年前
I am a non-biologist applying random forests and other methods to large genetic studies and I think it is unfair to discount the importance of specialists.<p>I'm not really surprised that domain knowledge doesn't predict success in these contests because the data is all ready featurized and sanitized to remove any features that could be used to "cheat."<p>Random forest works really well and will quickly pick up on any features in the data that it can use to build an accurate model that isn't useful in the real world. This includes unique identifiers and features that are correlated for the wrong reason.<p>For example if you are an e-commerce site with tiered shipping cost scheme and turn it loose on your raw database it will think shipping cost predicts purchase price. It will also think order id is an excellent predictor of purchase price since there since every unique order id has a cost.<p>The role of the domain expert is to determine what features can be fed into these black box methods and to recognize when the model is overfitting and re featurize the data. IE they might replace the "shipping cost" feature with a boolean "discounted shipping promotion" feature and see if still effects total purchase.
评论 #4895458 未加载
评论 #4897837 未加载
jph00超过 12 年前
I'm the Kaggle President &#38; Chief Scientist that is interviewed in this article. Feel free to ask me any questions that you have about the role of domain experts in data -driven decision making (or any other relevant topic!)<p>(It didn't occur to me to submit the article to HN - silly me! Thanks ivoflipse for doing so.)
评论 #4895537 未加载
评论 #4895401 未加载
评论 #4896115 未加载
DougBTX超过 12 年前
Title is misleading, it should say, "Data-prediction specialists beat specialists from other fields at data-prediction".
评论 #4894514 未加载
评论 #4895375 未加载
zwass超过 12 年前
This immediately makes me think of De Moivre's equation, which I just learned about from another front-paging HN article, "The Most Dangerous Equation" (<a href="http://news.ycombinator.com/item?id=4893258" rel="nofollow">http://news.ycombinator.com/item?id=4893258</a>)
damian2000超过 12 年前
How different or similar is this to the wisdom of crowds (<a href="http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds" rel="nofollow">http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds</a>)... or those online decision markets which have proved quite successful (<a href="http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds#Prediction_markets" rel="nofollow">http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds#Prediction...</a>)? Seems related in some ways.
评论 #4894619 未加载
im3w1l超过 12 年前
Is this a failure for bayesianism in practice? The people with the best priors still lost.
评论 #4895919 未加载
评论 #4895172 未加载
JanneVee超过 12 年前
It is actually scary considering that the analysts Wall Street would be considered specialists in economic forecasting and prediction.