This makes intuitive sense to me. It looks like o1 is a significant improvement in terms of performance, but maybe not the underlying architecture. Adding chains of thought and prover/verifier games to what OpenAI already had in 4o seems roughly sufficient to get the results we observe.