I am a non-biologist applying random forests and other methods to large genetic studies and I think it is unfair to discount the importance of specialists.<p>I'm not really surprised that domain knowledge doesn't predict success in these contests because the data is all ready featurized and sanitized to remove any features that could be used to "cheat."<p>Random forest works really well and will quickly pick up on any features in the data that it can use to build an accurate model that isn't useful in the real world. This includes unique identifiers and features that are correlated for the wrong reason.<p>For example if you are an e-commerce site with tiered shipping cost scheme and turn it loose on your raw database it will think shipping cost predicts purchase price. It will also think order id is an excellent predictor of purchase price since there since every unique order id has a cost.<p>The role of the domain expert is to determine what features can be fed into these black box methods and to recognize when the model is overfitting and re featurize the data. IE they might replace the "shipping cost" feature with a boolean "discounted shipping promotion" feature and see if still effects total purchase.
I'm the Kaggle President & Chief Scientist that is interviewed in this article. Feel free to ask me any questions that you have about the role of domain experts in data -driven decision making (or any other relevant topic!)<p>(It didn't occur to me to submit the article to HN - silly me! Thanks ivoflipse for doing so.)
This immediately makes me think of De Moivre's equation, which I just learned about from another front-paging HN article, "The Most Dangerous Equation" (<a href="http://news.ycombinator.com/item?id=4893258" rel="nofollow">http://news.ycombinator.com/item?id=4893258</a>)
How different or similar is this to the wisdom of crowds (<a href="http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds" rel="nofollow">http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds</a>)... or those online decision markets which have proved quite successful (<a href="http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds#Prediction_markets" rel="nofollow">http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds#Prediction...</a>)? Seems related in some ways.