TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Kaggle Ensembling Guide

51 pointsby jphilip147almost 10 years ago

2 comments

sdenton4almost 10 years ago
Couple points:<p>a) I think one of the biggest challenges in a Kaggle competition is getting away from overfitting to the leaderboard. It&#x27;s super common... I won a Kaggle competition last year, and was something like 65th place on the public leaderboard at the end: the other teams were overfitting like crazy. As such, one should be super careful when taking &#x27;well-performing&#x27; models to build an ensemble.<p>b) The point about the ensembling of uncorrelated models is hella important. If you make an ensemble consisting of 20 near-identical predictions from one algorithm, and 10 near-identical predictions from another algorithm, you&#x27;re in effect taking a vote between the two algorithms and giving the first one a 2&#x2F;3&#x27;s weighting.<p>It might be interesting to think about explicitly de-correlating the model outputs, and finding an nice &#x27;voting&#x27; method for combining the results... (And actually, this comes down to Z_2 arithmetic, so we could probably use a fourier transform for it... think I feel a blog post coming on.)
评论 #9720463 未加载
评论 #9723799 未加载
solvealmost 10 years ago
Surprisingly good, both as a broad overview and in the specifics.