I think this is a terrible analysis with a weak conclusion.<p>There's zero mention of how long it took the LLM to write the code vs the human. You have a 300 second runtime limit, but what was your coding time limit? The machine spat out code in, what, a few seconds? And how long did your solutions take to write?<p>Advent of code problems take me longer to just <i>read</i> than it takes an LLM to have a proposed solution ready for evaluation.<p>> <i>they didn’t perform nearly as well as I’d expect</i><p>Is this a joke, though? A machine takes a problem description written as floridly hyperventilated as advent problems are, and, without any opportunity for automated reanalysis, it understands the exact problem domain, it understands exactly what's being asked, correctly models the solution, and spits out a correct single-shot solution on 20 of them in no time flat, often with substantially better running time than your own solutions, and that's disappointing?<p>> <i>a lot of the submissions had timeout errors, which means that their solutions might work if asked more explicitly for efficient solutions. However the models should know very well what AoC solutions entail</i><p>You made up an arbitrary runtime limit and then kept that limit a secret, and you were surprised when the solutions didn't adhere to the secret limit?<p>> <i>Finally, some of the submissions raised some Exceptions, which would likely be fixed with a human reviewing this code and asking for changes.</i><p>How many of your solutions got the correct answer on the first try without going back and fixing something?