> We include one example in Figure 26, where clear state-tracking behavior is demonstrated.<p>Figure 26 appears to start with "we need to predict the output", and follow with code, input, and output. Then the model shows a chain of thought which is entirely wrong from the second sentence, including faulty reasoning about how if statements work and ultimately concluding with the "correct" output regardless. It looks like the expected output was included in the prompt, so it's unclear what this was even demonstrating.<p>Figure 32 indicates that the model "became aware" that it was in a competitive environment, "designed to keep machine learning models...guessing". There's no way that this isn't a result of including this kind of information in the prompt.<p>Overall, this approach feels like an interesting pursuit, but there's so much smoke and mirrors in this paper that I don't trust anything it's saying.