If anyone's interested, I made Colab notebooks with free GPUs for both GRPO (the algo DeepSeek used) to train a reasoning model from scratch, and also general finetuning, which the Berkeley team employed!<p>GRPO notebook for Llama 3.1 8B: <a href="https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb" rel="nofollow">https://colab.research.google.com/github/unslothai/notebooks...</a><p>General finetuning notebook: <a href="https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb" rel="nofollow">https://colab.research.google.com/github/unslothai/notebooks...</a><p>The Berkeley team's 17K dataset: <a href="https://huggingface.co/datasets/NovaSky-AI/Sky-T1_data_17k" rel="nofollow">https://huggingface.co/datasets/NovaSky-AI/Sky-T1_data_17k</a> Hugging Face also released a 220K dataset: <a href="https://huggingface.co/datasets/open-r1/OpenR1-Math-220k" rel="nofollow">https://huggingface.co/datasets/open-r1/OpenR1-Math-220k</a>