TechEcho

Couple points:<p>a) I think one of the biggest challenges in a Kaggle competition is getting away from overfitting to the leaderboard. It's super common... I won a Kaggle competition last year, and was something like 65th place on the public leaderboard at the end: the other teams were overfitting like crazy. As such, one should be super careful when taking 'well-performing' models to build an ensemble.<p>b) The point about the ensembling of uncorrelated models is hella important. If you make an ensemble consisting of 20 near-identical predictions from one algorithm, and 10 near-identical predictions from another algorithm, you're in effect taking a vote between the two algorithms and giving the first one a 2/3's weighting.<p>It might be interesting to think about explicitly de-correlating the model outputs, and finding an nice 'voting' method for combining the results... (And actually, this comes down to Z_2 arithmetic, so we could probably use a fourier transform for it... think I feel a blog post coming on.)

Surprisingly good, both as a broad overview and in the specifics.

Kaggle Ensembling Guide

2 comments

Kaggle Ensembling Guide

2 comments