31 pointsby amrrs4 months ago

2 comments

mfi4 months ago

Deepseek R1 paper that the blogpost is written around: <a href="https://arxiv.org/pdf/2501.12948" rel="nofollow">https://arxiv.org/pdf/2501.12948</a>

okdood644 months ago

Can someone dumb to me, a generalist engineer who has a very surface level knowledge of how training LLMs work: what people were doing before and what GRPO is doing different?

评论 #42849790 未加载

How Deepseek R1 Was Trained

2 comments

How Deepseek R1 Was Trained

2 comments