TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Questionable Practices in Machine Learning

6 pointsby beckhamc8 months ago

1 comment

beckhamc8 months ago
I&#x27;m not sure if this is directly mentioned in the paper, but I didn&#x27;t see any mention specifically about the conflation between a validation set <i>and</i> test set. When people actually make a distinction between the two (which is seemingly not all that common nowadays), you&#x27;re meant to perform model selection on the validation set, i.e. find the best HPs such that you minimise `loss(model,valid_set)`. Once you&#x27;ve found your most performant model according to that, you then evaluate it on the test set once, and that&#x27;s your unbiased measure of generalisation error. Since the ML community (and reviewers) are obsessed with &quot;SOTA&quot;, &quot;novelty&quot;, and bold numbers, a table of results purely composed of test set numbers is not easily controllable (when you&#x27;re trying to be ethical) from the point of view of actually &quot;passing&quot; the peer review process. Conversely, what&#x27;s easily controllable is a table full of validation set numbers: just perform extremely aggressive model validation on your model until your model gets higher numbers than everything else. Even simpler solution, why not just ditch the distinction between the valid and test set to begin with? (I&#x27;m joking, btw.) Now you see the problem.