In addition to the evaluation issues, it looks like several of their test sets have significant overlap with the test sets [1]. Especially for a compression-based technique, having exact duplicates is going to help a lot.<p>[1] <a href="https://github.com/bazingagin/npc_gzip/issues/13">https://github.com/bazingagin/npc_gzip/issues/13</a>