i don't think having weaknesses like these say too much about the general capabilities or potential of LLMs. try this yourself: generate a novel pangram without any sort of iterative process or revisions -- the first thing that pops into your head you must commit to paper. it's very hard. it's a lot easier if you go in alphabetical order, as you don't need to keep track of which letters you've already used. interestingly, gpt-4o also performs better at this task when you ask it to go in alphabetical order.<p>LLMs have known weaknesses, many having to do with an inability to "think" without using tokens. so for tasks like these, you can dramatically increase their performance by getting them to think out loud. this was the crux of the "let's verify step by step" paper.<p>i ran this prompt three times:<p>### prompt start ###<p>your goal is to create a novel pangram in the spanish language that sounds natural. it should be grammatically correct and coherent. it shouldn't be just technically coherent and grammatically correct, it should sound like a normal sentence.<p>first, print out the spanish alphabet. these are your $remainingletters. $sentence = "". then enter a loop where you do the following:<p>loop 1:
- select a random letter from $remainingletters choose based on which one allows you to add the most natural sounding word to the sentence, do NOT go in alphabetical order
- eliminate the letter you chose from the remaining letters
- ensure the word you chose actually begins with that letter! very important
- add the word you chose for that letter to $sentence
- are there letters in $remaining? if so, go back to start of loop 1. otherwise move on to loop 2.<p>loop 2:
- go through the spanish alphabet in order and ensure your $sentence contains a word starting with that letter
- once every letter is accounted for, translate the sentence to english
- does it sound like a natural sentence?
- if not, go back to the start of loop 2
- if so, print $sentence as well as its translation in english<p>think out loud, keep track of your work as you go<p>i'm not asking you to generate code, i'm just explaining how you should accomplish the task<p>### prompt end ###<p>i ran this prompt three times, and each time it generated a valid pangram that was coherent and grammatically correct. i can't be bothered to run it a bunch of times to get an accurate success rate, but i'm fairly sure there exists a prompt with a success rate of 100%. there are a lot of output tokens available, and by asking the model to iterate, it will arrive at something correct far before 128,000 tokens are exhausted.<p>sidenote: when i asked gpt-4o to generate 5 novel pangrams in english, and it got them all right. so language definitely matters when it comes to getting things like this right in one shot.