I've been playing with Qwen's QwQ-32b, and watching this thing's chain of thought is really interesting. In particular, it's pretty good at catching its own mistakes, and at the same time, gives off a "feeling" of someone very uncertain about themselves, trying to verify their answer again and again. Which seems to be the main reason why it can correctly solve puzzles that some much larger models fail. You can still see it occasionally hallucinate things in the CoT, but they are usually quickly caught and discarded.<p>The only downsides of this approach is that it requires a lot of tokens before the model can ascertain the correctness of its answer, and also that sometimes it just gives up and concludes that the puzzle is unsolvable (although that second part can be mitigated by adding something like "There is definitely a solution, keep trying until you solve it" to the prompt).