Deepseek R1 paper that the blogpost is written around: <a href="https://arxiv.org/pdf/2501.12948" rel="nofollow">https://arxiv.org/pdf/2501.12948</a>
Can someone dumb to me, a generalist engineer who has a very surface level knowledge of how training LLMs work: what people were doing before and what GRPO is doing different?