> do you think this kind of RL is enough to generalize beyond math and code? as in generalizing into domains which aren't easily verifiable<p>> 1 Exactly the right question to be asking atm imo.
2 Not obvious.
3 Probably yes.<p>But both math and code are easy to verify, they are rigorous. There are many other tasks are not. I doubt what works for math and code and be generalized to other things.