One trick you could try is: in find_longest_match, if you already have a match, check if the byte at match_maxlen matches before doing the linear compare off all bytes up to it.<p>If that one byte does not match, the entire match has no chance of being longer than the current best (in this simple case).
A nice trick. It could be used for generalized string search as well as compression. And if you indexed bigrams instead of single characters, it could be even faster.<p>I especially like the clear, easy-to-understand, well-written presentation along with links to prior art. Wouldn't it be nice if most academic papers were written like this?