Techniques to improve reliability

302 pointsby tedsandersover 2 years ago

17 comments

Imnimoover 2 years ago

I feel like there should be a LLM architecture which includes "scratch space" - tokens the model can write to and read from which do not constitute part of its output. The trouble with current architectures is that they can only do a finite amount of computation per output token - they get one forward pass and then have to output something. Chain-of-thought reasoning allows the model to devote more computation to finding the answer, storing intermediate results in its output tokens. But this is silly - most of the intermediate tokens are not providing useful information towards solving the problem, they're just wasted computation:>There are 16 balls in total. >Half of the balls are golf balls. >That means that there are 8 golf balls. >Half of the golf balls are blue. >That means that there are 4 blue golf balls.For the number of forward passes being done to generate this text, only a few tokens are actually helpful - most are grammatical filler. Further, the model is losing information by being forced to project its state down to a single output token. Even more, the most probable one-step output may not even be the most informative or helpful!It'd be much nicer if the model could write arbitrary, continuous-valued tokens to a private scratch space and then attend to those tokens as though they were words in the prompt while generating the actual output, potentially performing several forward passes per output token when necessary.In short, if chain-of-thought prompting is such a good idea, we should bake it into the model. Obviously all of this is FAR easier said than done.

评论 #34463937 未加载

评论 #34464309 未加载

评论 #34463738 未加载

评论 #34465057 未加载

评论 #34464828 未加载

评论 #34468575 未加载

评论 #34464153 未加载

zug_zugover 2 years ago

So here's a trick - which worked for the clue questionstep 1: Hi, I'm going to ask you some questions soon. But instead of answering the questions, I want you to instead write out instructions for yourself to help you reason through the question and come up with the best answerstep 2: [provide clue question]step 3: Now follow the instructions you have just written to answer the question..... The answer to the question is: (a) Yes; Colonel Mustard was in the observatory with the candlestickEdit: mixed results for the apple question with this technique

Ozzie_osmanover 2 years ago

I feel like within 6 months the models will have adapted to not need these "clever" tricks. Presumably, if for many cases the trick is to say "Let's think step by step", that's something the model can learn to do on its own without the prompt.The real interesting thing will be feeding alternative data into these models. Whether it's certain structured corpus, silo'd enterprise data, or personal data.

Waterluvianover 2 years ago

It seems that ChatGPT is incapable of whatever we experience with the “ohhhhhh!” eureka moment.I give it simple riddles that it doesn’t solve. I then point out the obvious answer and it just doubles down like that really stubborn friend I had in high school. It never does the, “ohhhh! Aha! Yes that’s the answer.”

minimaxirover 2 years ago

Note that this was originally published in September 2022, before text-davinci-003 was released November 2022 which lets you do whatever you want without as much effort.

评论 #34463899 未加载

fzeindlover 2 years ago

Slightly off-topic, but a great way of modifying ChatGPT-prompts is by letting it answer as a different age: <a href="https://fabianzeindl.com/posts/chatgpt-simulating-agegroups" rel="nofollow">https://fabianzeindl.com/posts/chatgpt-simulating-agegroups</a>

gandalfgeekover 2 years ago

I was surprised to see the omission of a prompt technique called program-aided prompting.Paper: <a href="https://arxiv.org/abs/2211.10435" rel="nofollow">https://arxiv.org/abs/2211.10435</a> GitHub: <a href="https://github.com/reasoning-machines/pal">https://github.com/reasoning-machines/pal</a>tl;dr -- LLMs are bad at basic arithmetic and logic (as their opening examples with math word problems show), but they do much better if instead of asking them for the answer, you ask for code to compute the answer. Then evaluate or run the code to get the answer.

评论 #34463820 未加载

评论 #34464009 未加载

auxfilover 2 years ago

Anyone else clicked here out of a personal development interest rather than machine learning?

haldujaiover 2 years ago

I was hoping this would link me to a deeper discussion on hallucination.I'm intrigued that it's hallucinating sequences that appear to have never written before (at least not on Google) and not just recalling some crappy training data.Anecdotally (and expectedly) it happens a lot on ChatGPT with specialized scientific questions (random radiology and medical stuff). I am assuming some of this is due to the training corpus although Galactica suffered from the same thing, and the GPT3 corpora would have included a lot of scientific webpages.Anyone have any resources that investigate why this happens?

评论 #34463514 未加载

评论 #34463531 未加载

评论 #34464117 未加载

charcircuitover 2 years ago

For few shot can you ask the model to generate the few initial shots.A: Translate $SENTENCE from English into GermanB: Generate 3 example translations from English into German and then translate $SENTENCE from English to German.

choegerover 2 years ago

I am just wondering, if they trained the model exclusively with real-world data, where are the nonsense answers? People don't always answer seriously. Think reddit threads. Handpicking would probably not be feasible, so how did they do it? Or is there a snarky reddit response somewhere deep inside the model for every question?

评论 #34464980 未加载

cwilbyover 2 years ago

It is interesting to me that the approach required to work with this tool, is almost identical to using every other tool.It boils down to - "try breaking this problem into smaller problems to increase the solution space".

评论 #34465544 未加载

jeffrallenover 2 years ago

Curious that this article could just as well have the headline "how to cooperate better with that one particularly dense colleague".

pharmakomover 2 years ago

As humans we know when and how to interface with a better adding device, such as a calculator. Could a LLM not do the same?

评论 #34465632 未加载

vcdimensionover 2 years ago

These techniques are similar to those that I use for teaching maths & statistics to humans.

评论 #34466685 未加载

jesuspieceover 2 years ago

Are we sure we want to make chatgpt super smart

aiqqover 2 years ago

"If you were asked to multiply 13 by 17, would the answer pop immediately into your mind? For most of us, probably not. Yet, that doesn't mean humans are incapable of two-digit multiplication. ... Similarly, if you give GPT-3 a task that's too complex"Precisely the tone and wording of the aggressive marketing campaign around this product. Confirms where the spam originated from on reddit and everywhere else. Wondering how many bots and fake redditors they paid to promote this?