This stuff is great. The rewards/prizes for the best model are so minimal compared to what it usually costs to build a great model via a consulting contract or hiring a high quality miner.<p>Similar to Mechanical Turk, we've managed to create a completely different value structure for some amazing work by smart folks... mostly by making it a competition. Great exposure for winners, sure, but these prizes are pretty minimal.<p><a href="http://www.kaggle.com/c/GiveMeSomeCredit" rel="nofollow">http://www.kaggle.com/c/GiveMeSomeCredit</a>, for example, has a total basket of US$5K (only US$3K for first place) for a model predicting credit scores (in this case, likelihood to default or have financial distress). Folks I talk to who do this type of work professionally tend to charge far more than that to create these models.<p>For the company sharing this data, of course, big win: They get a cheap, potentially fantastic new model, and the creator gets some good exposure and some cash. But if these take off, they can really change the economics of how this work is created.
How does Intellectual Property work in competitions like these? Are the entrants allowed to use proprietary methods? Do they give up IP by participating? Do the hosts of competitions gain any rights to IP or its usage?<p>It's not immediately clear by skimming the legal terms of service. I couldn't find a FAQ.
Would "kaggle" be able to handle patient data? Would "kaggle" sign data use agreements with hospitals that are interested in a shared task? There is a growing number of medical data mining competitions, e.g.:<p><a href="https://www.i2b2.org/NLP/Coreference/PreviousChallenges.php" rel="nofollow">https://www.i2b2.org/NLP/Coreference/PreviousChallenges.php</a><p>But the data mining challenge delivery systems in medicine are scattered. Mostly because of inability to create a secure and centralized web service.