TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Getting Lossless Compression Adopted for Rigorous LLM Benchmarking

1 点作者 jabowery超过 1 年前

1 comment

jabowery超过 1 年前
The increasing recognition that &quot;Language Modeling Is Compression&quot; <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2309.10668.pdf" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2309.10668.pdf</a> has not yet been accompanied by recognition that lossless compression is the most principled unsupervised loss function for world models in general, including foundation language models in particular.<p>Take, for instance, the unprincipled definition of &quot;parameter count&quot; not only in the LLM scaling law literature, but the Zoo of what statisticians called &quot;Information Criteria for Model Selection&quot;. <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Model_selection#Criteria" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Model_selection#Criteria</a><p>The reductio ad absurdum of &quot;parameter count&quot; is arithmetic coding where an entire dataset can be encoded as a single &quot;parameter&quot; of arbitrary precision.<p>By contrast, the algorithmic bit of information (whether part of an executable instruction or program literal) is an unambiguous quantity up to the choice of instruction set. If you want to quibble about that instruction set choice, take it up with John Tromp <a href="https:&#x2F;&#x2F;tromp.github.io&#x2F;cl&#x2F;cl.html" rel="nofollow">https:&#x2F;&#x2F;tromp.github.io&#x2F;cl&#x2F;cl.html</a> because what I&#x27;m about to propose obviates that along with a lot of other &quot;arguments&quot;.<p>Since any executable archive of any kind of data can serve as a model of the world generating that data, it follows that any executable archive of any text corpus can serve as a language model with a rigorous &quot;parameter count&quot;. Therefore, a procedure which runs LLM benchmarks against any such executable archive as a language model, contributes a uniquely rigorous data point to the literature on LLM scaling laws.<p>So, what I&#x27;m proposing is that authors of lossless compression algorithms consider adding a command-line option that, at the end of decompression, saves the state of the decompression process in a file that can be read back in and executed as a language model -- with the full understanding that these language models will perform very poorly on the vast majority of LLM benchmarks. The point is not to produce high quality language models. The point is to increase rigor in the research community by providing some initial data points that exemplify the approach.