I wrote this after spending December thinking about why o3's expensive memory usage matters. The key insight feels trivial (CS101: universal computation needs stable state + memory), but watching xAI (and now Stargate) build gigawatt facilities while o1 explicitly discards its own reasoning made me realize we've forgotten something fundamental... <a href="https://arxiv.org/abs/2412.17794" rel="nofollow">https://arxiv.org/abs/2412.17794</a>