TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Absolute Zero: Reinforced Self-Play Reasoning with Zero Data

88 pointsby leodriesch7 days ago

8 comments

a21287 days ago
To be clear, this is not a model trained on zero data, this is a pretrained model (Qwen 2.5 trained on 18 trillion tokens) finetuned using self-generated data grounded by a Python interpreter
评论 #43953222 未加载
评论 #43954285 未加载
macrolime7 days ago
Pretty sure OpenAI and/or DeepMind have already been doing something very similar for a while already, just without publishing it.
评论 #43953087 未加载
Waterluvian7 days ago
Related to this: has anyone seen a model respond with “oh wait I was wrong…” when you follow-up with a “can you explain why this answer is right?”<p>I still find that my uses of GPT and others still struggle with a sort of tunnel vision.
评论 #43995031 未加载
squillion7 days ago
Warning: abuse of this technique may cause the model to go blind.
评论 #43953780 未加载
nullc7 days ago
Be nice to see some of these run on languages the pretrained model is a little less good at than Python and JS.
QuadmasterXLII7 days ago
For everyone who says “modern incentives forbid publishing negative results,” let this stand as a counterexample!
评论 #43953360 未加载
gitroom7 days ago
sometimes i feel like the whole self-play thing is kinda the obvious path now but still nuts seeing it actually work better than huge data dumps. you ever wonder how much of progress is just crazy good pipelines versus actual breakthroughs?
mentalgear7 days ago
&quot;Despite using zero human-curated data, AZR achieves state-of-the-art results on diverse coding and math reasoning benchmarks, even outperforming models trained on large in-domain datasets. This demonstrates the potential for sophisticated reasoning skills to emerge purely through self-play without domain-specific supervision.&quot;
评论 #43954571 未加载