TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Sky-T1: Train your own O1 preview model within $450

44 点作者 fofoz4 个月前

4 条评论

elashri4 个月前
&gt; We initially trained a 32B model using 3–4K math problems from the Numina dataset (provided by STILL-2), achieving a significant improvement in AIME24 accuracy from 16.7% to 43.3%. However, when we incorporated coding data generated from the APPs dataset into the training process, AIME24 accuracy dropped to 36.7%. We hypothesize that this decline is due to the distinct reasoning approaches required for math and coding tasks.<p>This is interesting. For large models that were trained on much more data. I wonder if o1 is trained in a different way that GPT-4o. Do they only rely on synthetic data (plus some hand crafted datasets). But then how would O1 knows a lot of facts like GPT-4o indicating that these were in the training.<p>Can someone with more understanding and knowledge weight on this?
thot_experiment4 个月前
They finetuned QwQ to perform well on a benchmark. For the past two years there has been a constant stream of &quot;X fine-tune beats Y closed model on Z benchmark&quot;. This isn&#x27;t interesting and has never been interesting, see Goodhart&#x27;s Law. If you&#x27;re actually using local models day to day you will quickly find that finetunes are almost universally a waste of time. Even when it comes to something like smutty roleplay gaslighting a model can often lead to more interesting and consistent results because finetunes are basically always overfit to the training data.
评论 #42753072 未加载
zamadatix4 个月前
Fine tune an existing model. Training such a model so cheaply would&#x27;ve been nuts.
ipsum24 个月前
For math only.