I feel like there should be a LLM architecture which includes "scratch space" - tokens the model can write to and read from which do not constitute part of its output. The trouble with current architectures is that they can only do a finite amount of computation per output token - they get one forward pass and then have to output something. Chain-of-thought reasoning allows the model to devote more computation to finding the answer, storing intermediate results in its output tokens. But this is silly - most of the intermediate tokens are not providing useful information towards solving the problem, they're just wasted computation:<p>>There are 16 balls in total.
>Half of the balls are golf balls.
>That means that there are 8 golf balls.
>Half of the golf balls are blue.
>That means that there are 4 blue golf balls.<p>For the number of forward passes being done to generate this text, only a few tokens are actually helpful - most are grammatical filler. Further, the model is losing information by being forced to project its state down to a single output token. Even more, the most probable one-step output may not even be the most informative or helpful!<p>It'd be much nicer if the model could write arbitrary, continuous-valued tokens to a private scratch space and then attend to those tokens as though they were words in the prompt while generating the actual output, potentially performing several forward passes per output token when necessary.<p>In short, if chain-of-thought prompting is such a good idea, we should bake it into the model. Obviously all of this is FAR easier said than done.
So here's a trick - which worked for the clue question<p>step 1:
Hi, I'm going to ask you some questions soon. But instead of answering the questions, I want you to instead write out instructions for yourself to help you reason through the question and come up with the best answer<p>step 2: [provide clue question]<p>step 3: Now follow the instructions you have just written to answer the question.<p>.... The answer to the question is: (a) Yes; Colonel Mustard was in the observatory with the candlestick<p>Edit: mixed results for the apple question with this technique
I feel like within 6 months the models will have adapted to not need these "clever" tricks. Presumably, if for many cases the trick is to say "Let's think step by step", that's something the model can learn to do on its own without the prompt.<p>The real interesting thing will be feeding alternative data into these models. Whether it's certain structured corpus, silo'd enterprise data, or personal data.
It seems that ChatGPT is incapable of whatever we experience with the “ohhhhhh!” eureka moment.<p>I give it simple riddles that it doesn’t solve. I then point out the obvious answer and it just doubles down like that really stubborn friend I had in high school. It never does the, “ohhhh! Aha! Yes that’s the answer.”
Note that this was originally published in September 2022, before text-davinci-003 was released November 2022 which lets you do whatever you want without as much effort.
Slightly off-topic, but a great way of modifying ChatGPT-prompts is by letting it answer as a different age: <a href="https://fabianzeindl.com/posts/chatgpt-simulating-agegroups" rel="nofollow">https://fabianzeindl.com/posts/chatgpt-simulating-agegroups</a>
I was surprised to see the omission of a prompt technique called program-aided prompting.<p>Paper: <a href="https://arxiv.org/abs/2211.10435" rel="nofollow">https://arxiv.org/abs/2211.10435</a>
GitHub: <a href="https://github.com/reasoning-machines/pal">https://github.com/reasoning-machines/pal</a><p>tl;dr -- LLMs are bad at basic arithmetic and logic (as their opening examples with math word problems show), but they do much better if instead of asking them for the answer, you ask for code to compute the answer. Then evaluate or run the code to get the answer.
I was hoping this would link me to a deeper discussion on hallucination.<p>I'm intrigued that it's hallucinating sequences that appear to have never written before (at least not on Google) and not just recalling some crappy training data.<p>Anecdotally (and expectedly) it happens a lot on ChatGPT with specialized scientific questions (random radiology and medical stuff). I am assuming some of this is due to the training corpus although Galactica suffered from the same thing, and the GPT3 corpora would have included a lot of scientific webpages.<p>Anyone have any resources that investigate why this happens?
For few shot can you ask the model to generate the few initial shots.<p>A: Translate $SENTENCE from English into German<p>B: Generate 3 example translations from English into German and then translate $SENTENCE from English to German.
I am just wondering, if they trained the model exclusively with real-world data, where are the nonsense answers? People don't always answer seriously. Think reddit threads. Handpicking would probably not be feasible, so how did they do it? Or is there a snarky reddit response somewhere deep inside the model for every question?
It is interesting to me that the approach required to work with this tool, is almost identical to using every other tool.<p>It boils down to - "try breaking this problem into smaller problems to increase the solution space".
"If you were asked to multiply 13 by 17, would the answer pop immediately into your mind? For most of us, probably not. Yet, that doesn't mean humans are incapable of two-digit multiplication.
...
Similarly, if you give GPT-3 a task that's too complex"<p>Precisely the tone and wording of the aggressive marketing campaign around this product. Confirms where the spam originated from on reddit and everywhere else. Wondering how many bots and fake redditors they paid to promote this?