> The NCV (Noisy Cross-Validation) method (Chen et al. 2019) divides the dataset into half at random, and then identifies data samples as “clean” if its label matches the predicted label provided by the model that is only trained on the other half of the dataset.<p>I was doing this trick in 2018, didn't write anything up. If you repeat this process a few times, it provides more fine-grained example difficulty signal so you can validate only the hard part by hand, or just skip it.
For those that don't know, Lilian Weng wrote one of the best "prompt engineering" howtos on the planet. It is beautiful in its succinct compactness.<p><a href="https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/" rel="nofollow">https://lilianweng.github.io/posts/2023-03-15-prompt-enginee...</a>