I somewhat understand how it generates texts, but I don't understand how can it understand the query to generate the text. I googled and landed on [1] which doesn't answer me. Why is there no info on this part anywhere?<p>[1] https://ai.stackexchange.com/questions/38294/how-does-chatgpt-respond-to-novel-prompts-and-commands
LLMs simply process the input and generate outputs based on patterns seen during training.<p>Here's the process in brief:<p>Tokenization: The input text gets broken down into smaller chunks, or tokens. Tokens can range from a single character to a whole word.<p>Embedding: Tokens get translated into numerical vectors - this is how models can process them.<p>Processing: These vectors are then processed in the context of the others. This is done via a type of neural network called a Transformer[0] network, which handles context particularly well.<p>Context Understanding: The model uses patterns learned from its training to predict the next word in a sentence. It's not a human-like understanding, but rather it estimates the statistical probability of a word following the preceding ones.<p>Generation: The model generates a response by continuously predicting the next word until a full response is formed or it reaches a certain limit.<p>[0]: <a href="https://huggingface.co/learn/nlp-course/chapter1/4" rel="nofollow noreferrer">https://huggingface.co/learn/nlp-course/chapter1/4</a>
It “understands” the prompt by passing the data through the neural network and activating individual neurons to a greater or lesser extent.<p>In the case of BERT models (which I know better), there is an an activation for each token and that activation captures the meaning of the token in context. You can average these over all the tokens in a document and get a vector which is similar to the document vectors used in information retrieval. Traditionally you would count how many times each word is in a document and make a vector indexed by words, but the BERT vector can (1) find synonyms since these typically have a vector close to words with similar meanings and (2) differentiate different meanings of a word because the neuron activation is affected by the other words around it.<p>Activation of the neural network is the way that it represents the input text and I think “representation” is what is going on when it “understands insofar as it does.
It doesn't.<p>LLMs are mysterious black-boxes which cannot transparently explain themselves or their decisions and just regurgitates and rewords the output it was trained on and doesn't even know if its generated text is correct.<p>Grady Booch has deconstructed this question perfectly in a recent Twitter thread. [0]<p>[0] <a href="https://twitter.com/Grady_Booch/status/1673797840605433856" rel="nofollow noreferrer">https://twitter.com/Grady_Booch/status/1673797840605433856</a>