科技回声

Wow, so empty here, and it was 3 days ago... I wonder why?I have a question if someone can answer. On github they are stating 8x80GB for 14B models but I found no information on how long this fine-tuning takes?Given the toolchain, it probably takes significant time.Another question wouldn't it be fun to hijack the training loop with some tasks set by humans? Would it improve results or opposite?I wonder if at some point all tasks will degrade to the "uh-oh moment" tasks, which will be most complex and perplexing with no actual productive yield?

Paper: <a href="https://arxiv.org/abs/2505.03335" rel="nofollow">https://arxiv.org/abs/2505.03335</a>Code: <a href="https://github.com/LeapLabTHU/Absolute-Zero-Reasoner">https://github.com/LeapLabTHU/Absolute-Zero-Reasoner</a>Thread: <a href="https://xcancel.com/AndrewZ45732491/status/1919920459748909288" rel="nofollow">https://xcancel.com/AndrewZ45732491/status/19199204597489092...</a>

Absolute Zero: Reinforced Self-Play Reasoning with Zero Data

2 条评论

Absolute Zero: Reinforced Self-Play Reasoning with Zero Data

2 条评论