Cerebras-GPT, DataBricks's Dolly performs reasonably well on many instruction-based tasks while being significantly smaller than GPT-3, challenging the notion that is big always better!<p>From my personal experience, the quality of the model depends a lot on the fine-tuning data as opposed to just the sheer size. If you choose your retraining data correctly, you can fine-tune your smaller model to perform better than the state-of-the-art GPT-X. The future of LLMs might look more open-source than imagined 3 months back!<p>Would love to hear everyone's opinions on how they see the future of LLMs evolving? Will it be few players (OpenAI) cracking the AGI and conquering the whole world or a lot of smaller open-source models which ML engineers fine-tune for their use-cases?<p>P.S. I am kinda betting on the latter and building UpTrain (https://github.com/uptrain-ai/uptrain), an open-source project which helps you collect that high quality fine-tuning dataset
You've asked a very deep question. What is the information density of language? Of general knowledge? How many bits, in theory, are required to describe the vocabulary and grammar of a language like English in a way, such that software can manipulate a variety of natural language tasks? How many bits are required, in theory, to contain a database of general knowledge?<p>ChatGPT has been compared to a "blurry JPEG of the web" by Ted Chiang [1] and I think that is a very appropriate analogy. There's a relationship between deep learning and compression. In a sense, a generative model like ChatGPT is a lossy compression algorithm that re-synthesizes outputs, approximately like its inputs. (Unsurprisingly, deep learning based methods blow traditional compression algorithms out of the water on compression ratio.)<p>I suspect the question is essentially the same as "How small could the complete text of Wikipedia be compressed?" or "How few bits does it take to lossily compress a human voice recording yet the individual still be recognizable?"<p>It's an unsolved philosophical problem. I don't know of any attempts to determine the lower bound. Intuitively, something like hundreds of kilobytes does seem inadequate. And hundreds of gigabytes is adequate. So, it's somewhere in between.<p>[1] <a href="https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web" rel="nofollow">https://www.newyorker.com/tech/annals-of-technology/chatgpt-...</a>