One of the authors, Charles, here. I wrote this up because 1) I've been following work on sampling/inference-time compute scaling and the "Large Language Monkeys" paper (<a href="https://arxiv.org/abs/2407.21787" rel="nofollow">https://arxiv.org/abs/2407.21787</a>) results looked promising and 2) our reproduction was effortless and immediate, which has not been my experience with most research.<p>Happy to answer questions!