科技回声

liliumregale将近 2 年前

The paper has recently been called into question for overestimating their performance relative to BERT: <a href="https://news.ycombinator.com/item?id=36758433">https://news.ycombinator.com/item?id=36758433</a>. Might be good for the blog's author to take this into account in their explainer. The author's perspective sounds a bit too positive (and borderline salesmanlike).

评论 #36807850 未加载

评论 #36809925 未加载

numeri将近 2 年前

In addition to the evaluation issues, it looks like several of their test sets have significant overlap with the test sets [1]. Especially for a compression-based technique, having exact duplicates is going to help a lot.<p>[1] <a href="https://github.com/bazingagin/npc_gzip/issues/13">https://github.com/bazingagin/npc_gzip/issues/13</a>

stri8ed将近 2 年前

In such a scheme, wouldn't synonyms of the same word be no closer to each other, than any other random string?

评论 #36809984 未加载

Decoding the ACL Paper: Gzip and KNN Rival Bert in Text Classification

3 条评论

Decoding the ACL Paper: Gzip and KNN Rival Bert in Text Classification

3 条评论