TE
科技回声
首页
24小时热榜
最新
最佳
问答
展示
工作
中文
GitHub
Twitter
首页
Understanding Emergent Abilities of Language Models from the Loss Perspective
6 点
作者
maccaw
大约 1 年前
1 comment
cosmojg
大约 1 年前
Does this mean that "overtraining" a midsize LLM for many more epochs on a small, representative subset of the dataset used by a larger, more performant LLM might be sufficient for matching the performance of the larger model?