TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A Parameter-Free Classification Method with Compressors

83 点作者 danielam将近 2 年前

4 条评论

cs702将近 2 年前
This is clever -- and useful in many settings that preclude the use of a deep neural network for classification.<p>Intuitively, the key idea is that if you have two documents, say, <i>x1</i> and <i>x2</i>, and a target document <i>y</i>, if <i>x1</i>&#x27;s statistical regularities are more similar to <i>y</i>&#x27;s than to <i>x2</i>&#x27;s, then <i>len(compress(x1+y)) - len(compress(y)) &lt; len(compress(x2+y)) - len(compress(y))</i>, where &quot;<i>+</i>&quot; means concatenation and &quot;<i>compress</i>&quot; is a compression program like gzip.<p><i>len(compress(x1+y)) - len(compress(y))</i> is, quite literally, the number of additional bytes we need to compress the statistical regularities in <i>x1</i> given the statistical regularities in <i>y</i>. The more similar the statistical regularities between <i>x1</i> and <i>y</i>, the fewer bytes we need to compress them together.<p>The authors use kNN using a distance function called normalized compression distance (NCD), based on the above idea. Remarkably, this simple, intuitive method outperforms BERT on a variety of zero-shot classification tasks!
评论 #36714120 未加载
评论 #36712139 未加载
homarp将近 2 年前
also at <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=36705472">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=36705472</a>
评论 #36712368 未加载
awinter-py将近 2 年前
david huffman puts down his eternal origami to look into the camera and wink
skybrian将近 2 年前
Here&#x27;s the Arxiv link from December:<p><a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2212.09410" rel="nofollow noreferrer">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2212.09410</a>