科技回声

16 条评论

If anyone's interested, I made Colab notebooks with free GPUs for both GRPO (the algo DeepSeek used) to train a reasoning model from scratch, and also general finetuning, which the Berkeley team employed!GRPO notebook for Llama 3.1 8B: <a href="https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb" rel="nofollow">https://colab.research.google.com/github/unslothai/notebooks...</a>General finetuning notebook: <a href="https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb" rel="nofollow">https://colab.research.google.com/github/unslothai/notebooks...</a>The Berkeley team's 17K dataset: <a href="https://huggingface.co/datasets/NovaSky-AI/Sky-T1_data_17k" rel="nofollow">https://huggingface.co/datasets/NovaSky-AI/Sky-T1_data_17k</a> Hugging Face also released a 220K dataset: <a href="https://huggingface.co/datasets/open-r1/OpenR1-Math-220k" rel="nofollow">https://huggingface.co/datasets/open-r1/OpenR1-Math-220k</a>

评论 #43135844 未加载

mkagenius3 个月前

Weird that they had to resort to click bait using "O1 preview" in their name.I expected some sort of way to actually get o1 preview retrained (and downloadable).Also, calling it O1 preview on just 7 benchmarks is not correct. What if someone comes up with some use cases where O1 preview does better than this.apart from that, good that things are becoming cheaper.

评论 #43127775 未加载

评论 #43127741 未加载

评论 #43126330 未加载

fl4tul43 个月前

I do love competition.In the last weeks are are seeing a torrent of advances, just because someone opened their architectures.Imagine where we could go if the training datasets were also publicly available and unbounded by any copyright laws. (I'm not talking about doing anything illegal).I can only dream, I guess.

评论 #43126807 未加载

评论 #43125786 未加载

评论 #43126965 未加载

评论 #43125945 未加载

scosman3 个月前

Inference time compute is still very under utilized in actual AI deployments. Lots of folks are working on foundation models, which require reasoning about broad problem domains. Not enough people are using the same techniques for task-specific performance improvements. You can easily distill the reasoning from larger models like R1 for your task. Often better, you can mix in custom thinking instructions for specific sub-problems so a fine tuned model learns a mix of task specific reasoning and custom logic. It’s not hard and easily beats prompt iteration. When you find bugs, you can fix it.I made a GitHub project for distilling thinking models (and customs COT inference time fine tuning): <a href="https://docs.getkiln.ai/docs/guide-train-a-reasoning-model" rel="nofollow">https://docs.getkiln.ai/docs/guide-train-a-reasoning-model</a>

评论 #43127438 未加载

rdli3 个月前

The blog post was a little unclear, so my summary was:- They used QwQ to generate training data (with some cleanup using GPT-4o-mini)- The training data was then used to FT Qwen2.5-32B-Instruct (non-reasoning model)- Result was that Sky-T1 performs slightly worse than QwQ but much better than Qwen2.5 on reasoning tasksThere are a few dismissive comments here but I actually think this is pretty interesting as it shows how you can FT a foundation model to do better at reasoning.

评论 #43130884 未加载

magicalhippo3 个月前

So this is a fine-tune and not from scratch, which makes the proposition much more reasonable.That said, for someone who's not in the game but been curious as to the details of fine-tuning, it's great to get both the dataset and the code.

Tiberium3 个月前

Better URL: <a href="https://novasky-ai.github.io/posts/sky-t1/" rel="nofollow">https://novasky-ai.github.io/posts/sky-t1/</a>

评论 #43125575 未加载

moconnor3 个月前

They trained on QwQ traces and in their evaluation they are… mostly slightly worse than QwQ.Hardly a huge win.

genpfault3 个月前

> The model training finishes in 19 hours on 8 H100 with DeepSpeed Zero-3 offload (~ $450 according to Lambda Cloud pricing).

tw19843 个月前

just several weeks ago, OpenAI was still using reasoning as a part of its tech moat to partially justify its hugely inflated valuation. in just weeks after the release of deepseek and kimi and their paper on how to do it, average joes can now train it at home by spending less than the purchase cost of one single mid end gaming GPU.

_joel3 个月前

It's not from scratch, though, right? Am I missing something here as to why it's at the top of the posts?

评论 #43126902 未加载

JoshTko3 个月前

Has anyone tested if the consensus of top 4-5 mini models together would out perform the best frontier model?

qqmm3 个月前

Is it because Deepseek decided to open their model? I noticed they have a similar timeline

m3kw93 个月前

Looks like they need to put quotes on the 450$

brador3 个月前

I just want to make music with AI and it is very difficult. The meta model on hugging gives an error when used through the website and no one will ever fix it.

评论 #43132637 未加载

评论 #43126992 未加载

评论 #43125885 未加载

rlforllms3 个月前

Wait so Qwen trained QWQ 32B from Qwen 32B and then they distill QWQ back into Qwen 32B? What's the point?This is massive marketing scam here. Borderline academic dishonesty.

评论 #43126592 未加载

评论 #43129696 未加载

评论 #43126133 未加载

16 条评论

danielhanchen3 个月前

评论 #43135844 未加载

mkagenius3 个月前

评论 #43127775 未加载

评论 #43127741 未加载

评论 #43126330 未加载

fl4tul43 个月前

评论 #43126807 未加载

评论 #43125786 未加载

评论 #43126965 未加载

评论 #43125945 未加载

scosman3 个月前

评论 #43127438 未加载

rdli3 个月前

评论 #43130884 未加载

magicalhippo3 个月前

Tiberium3 个月前

Better URL: <a href="https://novasky-ai.github.io/posts/sky-t1/" rel="nofollow">https://novasky-ai.github.io/posts/sky-t1/</a>

评论 #43125575 未加载

moconnor3 个月前

They trained on QwQ traces and in their evaluation they are… mostly slightly worse than QwQ.Hardly a huge win.

genpfault3 个月前

> The model training finishes in 19 hours on 8 H100 with DeepSpeed Zero-3 offload (~ $450 according to Lambda Cloud pricing).

tw19843 个月前

_joel3 个月前

It's not from scratch, though, right? Am I missing something here as to why it's at the top of the posts?

评论 #43126902 未加载

JoshTko3 个月前

Has anyone tested if the consensus of top 4-5 mini models together would out perform the best frontier model?

qqmm3 个月前

Is it because Deepseek decided to open their model? I noticed they have a similar timeline

m3kw93 个月前

Looks like they need to put quotes on the 450$

brador3 个月前

I just want to make music with AI and it is very difficult. The meta model on hugging gives an error when used through the website and no one will ever fix it.

评论 #43132637 未加载

评论 #43126992 未加载

评论 #43125885 未加载

rlforllms3 个月前

Wait so Qwen trained QWQ 32B from Qwen 32B and then they distill QWQ back into Qwen 32B? What's the point?This is massive marketing scam here. Borderline academic dishonesty.

评论 #43126592 未加载

评论 #43129696 未加载

评论 #43126133 未加载

Train Your Own O1 Preview Model Within $450

16 条评论

Train Your Own O1 Preview Model Within $450

16 条评论