TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Questionable Practices in Machine Learning

6 点作者 beckhamc8 个月前

1 comment

beckhamc8 个月前
I&#x27;m not sure if this is directly mentioned in the paper, but I didn&#x27;t see any mention specifically about the conflation between a validation set <i>and</i> test set. When people actually make a distinction between the two (which is seemingly not all that common nowadays), you&#x27;re meant to perform model selection on the validation set, i.e. find the best HPs such that you minimise `loss(model,valid_set)`. Once you&#x27;ve found your most performant model according to that, you then evaluate it on the test set once, and that&#x27;s your unbiased measure of generalisation error. Since the ML community (and reviewers) are obsessed with &quot;SOTA&quot;, &quot;novelty&quot;, and bold numbers, a table of results purely composed of test set numbers is not easily controllable (when you&#x27;re trying to be ethical) from the point of view of actually &quot;passing&quot; the peer review process. Conversely, what&#x27;s easily controllable is a table full of validation set numbers: just perform extremely aggressive model validation on your model until your model gets higher numbers than everything else. Even simpler solution, why not just ditch the distinction between the valid and test set to begin with? (I&#x27;m joking, btw.) Now you see the problem.