Conclusion from the paper is:<p>In this paper we present a new model to explain the behavior of Large Language Models. Our frame of reference is an abstract probability matrix, which contains the multinomial probabilities for next token prediction in each row, where the row represents a specific prompt. We then demonstrate that LLM text generation is consistent with a compact representation of this abstract matrix through a combination of embeddings and Bayesian learning. Our model explains (the emergence of) In-Context learning with scale of the LLMs, as also other phenomena like Chain of Thought reasoning and the problem with large context windows. Finally, we outline implications of our model and some directions for future exploration.<p>Where does the "Cannot Recursively Improve" come from?