TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Absolute Zero: Reinforced Self-Play Reasoning with Zero Data

86 点作者 leodriesch4 天前

8 条评论

a21284 天前
To be clear, this is not a model trained on zero data, this is a pretrained model (Qwen 2.5 trained on 18 trillion tokens) finetuned using self-generated data grounded by a Python interpreter
评论 #43953222 未加载
评论 #43954285 未加载
macrolime3 天前
Pretty sure OpenAI and/or DeepMind have already been doing something very similar for a while already, just without publishing it.
评论 #43953087 未加载
gitroom3 天前
sometimes i feel like the whole self-play thing is kinda the obvious path now but still nuts seeing it actually work better than huge data dumps. you ever wonder how much of progress is just crazy good pipelines versus actual breakthroughs?
Waterluvian3 天前
Related to this: has anyone seen a model respond with “oh wait I was wrong…” when you follow-up with a “can you explain why this answer is right?”<p>I still find that my uses of GPT and others still struggle with a sort of tunnel vision.
squillion3 天前
Warning: abuse of this technique may cause the model to go blind.
评论 #43953780 未加载
nullc3 天前
Be nice to see some of these run on languages the pretrained model is a little less good at than Python and JS.
QuadmasterXLII3 天前
For everyone who says “modern incentives forbid publishing negative results,” let this stand as a counterexample!
评论 #43953360 未加载
mentalgear4 天前
&quot;Despite using zero human-curated data, AZR achieves state-of-the-art results on diverse coding and math reasoning benchmarks, even outperforming models trained on large in-domain datasets. This demonstrates the potential for sophisticated reasoning skills to emerge purely through self-play without domain-specific supervision.&quot;
评论 #43954571 未加载