Ah, I remember reading this paper! Essentially, they created synthetic data by solving search problems using A*. Trained on this data, transformers unsurprisingly learned to solve these problems. They then improved their synthetic data by repeatedly solving a given problem many times with A*, and keeping only the shortest step solution. Transformers learned to be competitive with this improved search heuristic too!<p>Pretty remarkable stuff. Given the obsession over chat bots, folk often miss how revolutionary transformer sequence modeling is to...well, any sequence application that <i>isn't</i> a chat bot. Looking solely at speeding up scientific simulations by ~10x, it's a watershed moment for humanity. When you include the vast space of <i>other</i> applications, we're in for one wild ride, y'all.