(co)-author here. It was really interesting putting this together. We had some idea what LLMs/GPT-4 would and would not do well with, but were still surprised ourselves with a number of things. In particular, we knew it would really struggle with the acrostic, but the degree to which it just completely lost the plot was pretty surprising! It was also surprisingly difficult to convince it that Queen Elizabeth II had died in a lot of cases (it takes it better some times than others).