Main cheater identified as Pavel Pleskov.<p>According to this presentation<p><a href="https://www.slideshare.net/DataFestTbilisi/how-to-win-a-machine-learning-competition-pavel-pleskov" rel="nofollow">https://www.slideshare.net/DataFestTbilisi/how-to-win-a-mach...</a><p>He worked at H2O.ai (but I think he was now fired).
Prior to that (again according to the above).<p><pre><code> - Master of Science from Moscow State university
- New economics school (Moscow)
- Financial Consultant
- Quantitative Researcher
- HFT Fund partner
</code></pre>
Overall seems to be impressive track, this is the type of track that often mentioned on HN, the top firms would hire from...<p>Completely not clear why he needed to cheat, are there other sophisticated cheaters out there for these types of competitions?<p>May be there needs to be prises for 'checking' other peoples work..?
I don't see much wrong, or how this would be cheating. They produced a winning entry, it should be on the organizer to ensure that their test data set isn't trivially findable. It would be like testing a digit recognizer on the MNIST data set and being surprised when someone just hashes it. A real solution isn't to force opensourcing it is to get better metrics. Maybe add a random component like a GAN to generate potential test data, and see if anything classifies that correctly. In the real world when the metric becomes the target it ceases to be a good metric. So test what you want to test and not just some existing data set.<p>Edit: I didn't see that the test data was given. See the first reply to this comment.