That this works at all is pretty interesting. That it seems to work very well with math is quite interesting.<p>That said, this paper is part of the move we have right now blurring the lines of training and inference -- part of their method involves doing some reinforcement learning on questions they don't know the answer to, but can decompose into simpler questions, and using GRPO on those with a numerical 'checker'. This reinforced model then can answer more questions.<p>I like this. I think humans do this a lot; mulling on something, turning it over in their heads, analogizing, etc. Adding test time training is a way to do a lot more thinking than adding tokens to the context for fixed inference.<p>Just as DeepSeek and o1/o3 show that we can increase capacity with inference-time-token generation and assessment, it looks like we can increase capacity with inference-time automated fine tuning as well.<p>I'd hope that as these techniques solidify we'll have a new way to talk and think about this -- they are all part of the same fundamental process at some level.<p>Either way, super cool.