TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Thinking about high-quality human data

103 pointsby tim_swover 1 year ago

2 comments

visargaover 1 year ago
&gt; The NCV (Noisy Cross-Validation) method (Chen et al. 2019) divides the dataset into half at random, and then identifies data samples as “clean” if its label matches the predicted label provided by the model that is only trained on the other half of the dataset.<p>I was doing this trick in 2018, didn&#x27;t write anything up. If you repeat this process a few times, it provides more fine-grained example difficulty signal so you can validate only the hard part by hand, or just skip it.
nerpderp82over 1 year ago
For those that don&#x27;t know, Lilian Weng wrote one of the best &quot;prompt engineering&quot; howtos on the planet. It is beautiful in its succinct compactness.<p><a href="https:&#x2F;&#x2F;lilianweng.github.io&#x2F;posts&#x2F;2023-03-15-prompt-engineering&#x2F;" rel="nofollow">https:&#x2F;&#x2F;lilianweng.github.io&#x2F;posts&#x2F;2023-03-15-prompt-enginee...</a>
评论 #39323584 未加载
评论 #39323847 未加载