TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Train Your Own O1 Preview Model Within $450

429 点作者 9woc3 个月前

16 条评论

danielhanchen3 个月前
If anyone&#x27;s interested, I made Colab notebooks with free GPUs for both GRPO (the algo DeepSeek used) to train a reasoning model from scratch, and also general finetuning, which the Berkeley team employed!<p>GRPO notebook for Llama 3.1 8B: <a href="https:&#x2F;&#x2F;colab.research.google.com&#x2F;github&#x2F;unslothai&#x2F;notebooks&#x2F;blob&#x2F;main&#x2F;nb&#x2F;Llama3.1_(8B)-GRPO.ipynb" rel="nofollow">https:&#x2F;&#x2F;colab.research.google.com&#x2F;github&#x2F;unslothai&#x2F;notebooks...</a><p>General finetuning notebook: <a href="https:&#x2F;&#x2F;colab.research.google.com&#x2F;github&#x2F;unslothai&#x2F;notebooks&#x2F;blob&#x2F;main&#x2F;nb&#x2F;Llama3.1_(8B)-Alpaca.ipynb" rel="nofollow">https:&#x2F;&#x2F;colab.research.google.com&#x2F;github&#x2F;unslothai&#x2F;notebooks...</a><p>The Berkeley team&#x27;s 17K dataset: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;NovaSky-AI&#x2F;Sky-T1_data_17k" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;NovaSky-AI&#x2F;Sky-T1_data_17k</a> Hugging Face also released a 220K dataset: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;open-r1&#x2F;OpenR1-Math-220k" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;open-r1&#x2F;OpenR1-Math-220k</a>
评论 #43135844 未加载
mkagenius3 个月前
Weird that they had to resort to click bait using &quot;O1 preview&quot; in their name.<p>I expected some sort of way to actually get o1 preview retrained (and downloadable).<p>Also, calling it O1 preview on just 7 benchmarks is not correct. What if someone comes up with some use cases where O1 preview does better than this.<p>apart from that, good that things are becoming cheaper.
评论 #43127775 未加载
评论 #43127741 未加载
评论 #43126330 未加载
fl4tul43 个月前
I do love competition.<p>In the last weeks are are seeing a torrent of advances, just because someone opened their architectures.<p>Imagine where we could go if the training datasets were also publicly available and unbounded by any copyright laws. (I&#x27;m not talking about doing anything illegal).<p>I can only dream, I guess.
评论 #43126807 未加载
评论 #43125786 未加载
评论 #43126965 未加载
评论 #43125945 未加载
scosman3 个月前
Inference time compute is still very under utilized in actual AI deployments. Lots of folks are working on foundation models, which require reasoning about broad problem domains. Not enough people are using the same techniques for task-specific performance improvements. You can easily distill the reasoning from larger models like R1 for your task. Often better, you can mix in custom thinking instructions for specific sub-problems so a fine tuned model learns a mix of task specific reasoning and custom logic. It’s not hard and easily beats prompt iteration. When you find bugs, you can fix it.<p>I made a GitHub project for distilling thinking models (and customs COT inference time fine tuning): <a href="https:&#x2F;&#x2F;docs.getkiln.ai&#x2F;docs&#x2F;guide-train-a-reasoning-model" rel="nofollow">https:&#x2F;&#x2F;docs.getkiln.ai&#x2F;docs&#x2F;guide-train-a-reasoning-model</a>
评论 #43127438 未加载
rdli3 个月前
The blog post was a little unclear, so my summary was:<p>- They used QwQ to generate training data (with some cleanup using GPT-4o-mini)<p>- The training data was then used to FT Qwen2.5-32B-Instruct (non-reasoning model)<p>- Result was that Sky-T1 performs slightly worse than QwQ but much better than Qwen2.5 on reasoning tasks<p>There are a few dismissive comments here but I actually think this is pretty interesting as it shows how you can FT a foundation model to do better at reasoning.
评论 #43130884 未加载
magicalhippo3 个月前
So this is a fine-tune and not from scratch, which makes the proposition much more reasonable.<p>That said, for someone who&#x27;s not in the game but been curious as to the details of fine-tuning, it&#x27;s great to get both the dataset and the code.
Tiberium3 个月前
Better URL: <a href="https:&#x2F;&#x2F;novasky-ai.github.io&#x2F;posts&#x2F;sky-t1&#x2F;" rel="nofollow">https:&#x2F;&#x2F;novasky-ai.github.io&#x2F;posts&#x2F;sky-t1&#x2F;</a>
评论 #43125575 未加载
moconnor3 个月前
They trained on QwQ traces and in their evaluation they are… mostly slightly worse than QwQ.<p>Hardly a huge win.
genpfault3 个月前
&gt; The model training finishes in 19 hours on 8 H100 with DeepSpeed Zero-3 offload (~ $450 according to Lambda Cloud pricing).
tw19843 个月前
just several weeks ago, OpenAI was still using reasoning as a part of its tech moat to partially justify its hugely inflated valuation. in just weeks after the release of deepseek and kimi and their paper on how to do it, average joes can now train it at home by spending less than the purchase cost of one single mid end gaming GPU.
_joel3 个月前
It&#x27;s not from scratch, though, right? Am I missing something here as to why it&#x27;s at the top of the posts?
评论 #43126902 未加载
JoshTko3 个月前
Has anyone tested if the consensus of top 4-5 mini models together would out perform the best frontier model?
qqmm3 个月前
Is it because Deepseek decided to open their model? I noticed they have a similar timeline
m3kw93 个月前
Looks like they need to put quotes on the 450$
brador3 个月前
I just want to make music with AI and it is very difficult. The meta model on hugging gives an error when used through the website and no one will ever fix it.
评论 #43132637 未加载
评论 #43126992 未加载
评论 #43125885 未加载
rlforllms3 个月前
Wait so Qwen trained QWQ 32B from Qwen 32B and then they distill QWQ back into Qwen 32B? What&#x27;s the point?<p>This is massive marketing scam here. Borderline academic dishonesty.
评论 #43126592 未加载
评论 #43129696 未加载
评论 #43126133 未加载