TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

3 pointsby jasondaviesover 1 year ago

1 comment

gbickfordover 1 year ago
This paper is well written. The results are pretty wild. They observed some amazing reduction in training resources required to achieve similar benchmarks to models trained on conventional data:<p>&gt; We observe that even at the first checkpoint (10B tokens) of WRAP training, the average perplexity of the LLM on the Pile is lower than that achieved by pre-training on C4 for 15 checkpoints. This suggests a 15x pre-training speed-up.