103 pointsby tim_swover 1 year ago

2 comments

visargaover 1 year ago

> The NCV (Noisy Cross-Validation) method (Chen et al. 2019) divides the dataset into half at random, and then identifies data samples as “clean” if its label matches the predicted label provided by the model that is only trained on the other half of the dataset.<p>I was doing this trick in 2018, didn't write anything up. If you repeat this process a few times, it provides more fine-grained example difficulty signal so you can validate only the hard part by hand, or just skip it.

nerpderp82over 1 year ago

For those that don't know, Lilian Weng wrote one of the best "prompt engineering" howtos on the planet. It is beautiful in its succinct compactness.<p><a href="https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/" rel="nofollow">https://lilianweng.github.io/posts/2023-03-15-prompt-enginee...</a>

评论 #39323584 未加载

评论 #39323847 未加载

Thinking about high-quality human data

2 comments

Thinking about high-quality human data

2 comments