TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Decoding the ACL Paper: Gzip and KNN Rival Bert in Text Classification

34 pointsby abhi9ualmost 2 years ago

3 comments

liliumregalealmost 2 years ago
The paper has recently been called into question for overestimating their performance relative to BERT: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=36758433">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=36758433</a>. Might be good for the blog&#x27;s author to take this into account in their explainer. The author&#x27;s perspective sounds a bit too positive (and borderline salesmanlike).
评论 #36807850 未加载
评论 #36809925 未加载
numerialmost 2 years ago
In addition to the evaluation issues, it looks like several of their test sets have significant overlap with the test sets [1]. Especially for a compression-based technique, having exact duplicates is going to help a lot.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;bazingagin&#x2F;npc_gzip&#x2F;issues&#x2F;13">https:&#x2F;&#x2F;github.com&#x2F;bazingagin&#x2F;npc_gzip&#x2F;issues&#x2F;13</a>
stri8edalmost 2 years ago
In such a scheme, wouldn&#x27;t synonyms of the same word be no closer to each other, than any other random string?
评论 #36809984 未加载