TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Do we need 100B+ parameters in a large language model?

4 点作者 sourabh03agr大约 2 年前
Cerebras-GPT, DataBricks&#x27;s Dolly performs reasonably well on many instruction-based tasks while being significantly smaller than GPT-3, challenging the notion that is big always better!<p>From my personal experience, the quality of the model depends a lot on the fine-tuning data as opposed to just the sheer size. If you choose your retraining data correctly, you can fine-tune your smaller model to perform better than the state-of-the-art GPT-X. The future of LLMs might look more open-source than imagined 3 months back!<p>Would love to hear everyone&#x27;s opinions on how they see the future of LLMs evolving? Will it be few players (OpenAI) cracking the AGI and conquering the whole world or a lot of smaller open-source models which ML engineers fine-tune for their use-cases?<p>P.S. I am kinda betting on the latter and building UpTrain (https:&#x2F;&#x2F;github.com&#x2F;uptrain-ai&#x2F;uptrain), an open-source project which helps you collect that high quality fine-tuning dataset

1 comment

retrac大约 2 年前
You&#x27;ve asked a very deep question. What is the information density of language? Of general knowledge? How many bits, in theory, are required to describe the vocabulary and grammar of a language like English in a way, such that software can manipulate a variety of natural language tasks? How many bits are required, in theory, to contain a database of general knowledge?<p>ChatGPT has been compared to a &quot;blurry JPEG of the web&quot; by Ted Chiang [1] and I think that is a very appropriate analogy. There&#x27;s a relationship between deep learning and compression. In a sense, a generative model like ChatGPT is a lossy compression algorithm that re-synthesizes outputs, approximately like its inputs. (Unsurprisingly, deep learning based methods blow traditional compression algorithms out of the water on compression ratio.)<p>I suspect the question is essentially the same as &quot;How small could the complete text of Wikipedia be compressed?&quot; or &quot;How few bits does it take to lossily compress a human voice recording yet the individual still be recognizable?&quot;<p>It&#x27;s an unsolved philosophical problem. I don&#x27;t know of any attempts to determine the lower bound. Intuitively, something like hundreds of kilobytes does seem inadequate. And hundreds of gigabytes is adequate. So, it&#x27;s somewhere in between.<p>[1] <a href="https:&#x2F;&#x2F;www.newyorker.com&#x2F;tech&#x2F;annals-of-technology&#x2F;chatgpt-is-a-blurry-jpeg-of-the-web" rel="nofollow">https:&#x2F;&#x2F;www.newyorker.com&#x2F;tech&#x2F;annals-of-technology&#x2F;chatgpt-...</a>