TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Amateurs beat specialists in data-prediction competitions

56 pointsby ivoflipseover 12 years ago

7 comments

micro_camover 12 years ago
I am a non-biologist applying random forests and other methods to large genetic studies and I think it is unfair to discount the importance of specialists.<p>I'm not really surprised that domain knowledge doesn't predict success in these contests because the data is all ready featurized and sanitized to remove any features that could be used to "cheat."<p>Random forest works really well and will quickly pick up on any features in the data that it can use to build an accurate model that isn't useful in the real world. This includes unique identifiers and features that are correlated for the wrong reason.<p>For example if you are an e-commerce site with tiered shipping cost scheme and turn it loose on your raw database it will think shipping cost predicts purchase price. It will also think order id is an excellent predictor of purchase price since there since every unique order id has a cost.<p>The role of the domain expert is to determine what features can be fed into these black box methods and to recognize when the model is overfitting and re featurize the data. IE they might replace the "shipping cost" feature with a boolean "discounted shipping promotion" feature and see if still effects total purchase.
评论 #4895458 未加载
评论 #4897837 未加载
jph00over 12 years ago
I'm the Kaggle President &#38; Chief Scientist that is interviewed in this article. Feel free to ask me any questions that you have about the role of domain experts in data -driven decision making (or any other relevant topic!)<p>(It didn't occur to me to submit the article to HN - silly me! Thanks ivoflipse for doing so.)
评论 #4895537 未加载
评论 #4895401 未加载
评论 #4896115 未加载
DougBTXover 12 years ago
Title is misleading, it should say, "Data-prediction specialists beat specialists from other fields at data-prediction".
评论 #4894514 未加载
评论 #4895375 未加载
zwassover 12 years ago
This immediately makes me think of De Moivre's equation, which I just learned about from another front-paging HN article, "The Most Dangerous Equation" (<a href="http://news.ycombinator.com/item?id=4893258" rel="nofollow">http://news.ycombinator.com/item?id=4893258</a>)
damian2000over 12 years ago
How different or similar is this to the wisdom of crowds (<a href="http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds" rel="nofollow">http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds</a>)... or those online decision markets which have proved quite successful (<a href="http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds#Prediction_markets" rel="nofollow">http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds#Prediction...</a>)? Seems related in some ways.
评论 #4894619 未加载
im3w1lover 12 years ago
Is this a failure for bayesianism in practice? The people with the best priors still lost.
评论 #4895919 未加载
评论 #4895172 未加载
JanneVeeover 12 years ago
It is actually scary considering that the analysts Wall Street would be considered specialists in economic forecasting and prediction.