"Despite using zero human-curated data, AZR achieves state-of-the-art results on diverse coding and math reasoning benchmarks, even outperforming models trained on large in-domain datasets. This demonstrates the potential for sophisticated reasoning skills to emerge purely through self-play without domain-specific supervision."