Instead of doing a diff, curious if Normalized compression distance (NCD)[1] will yield a better result. It is very simple algorithm:<p>to compare two images, i1 and i2<p><pre><code> l1 = length(gzip(i1))
l2 = length(gzip(i2))
l12 = length(gzip(concatenate(i1, i2))
ncd = (l12 - min(l1, l2))/max(l1, l2)
</code></pre>
Here is a nice article where I found out about this long ago.<p><a href="https://yieldthought.com/post/95722882055/machine-learning-teaches-me-how-to-write-better-ai" rel="nofollow">https://yieldthought.com/post/95722882055/machine-learning-t...</a><p>From the article:<p>"Basically it states that the degree of similarity between two objects can be approximated by the degree to which you can better compress them by concatenating them into one object rather than compressing them individually."<p>[1] <a href="https://en.wikipedia.org/wiki/Normalized_compression_distance" rel="nofollow">https://en.wikipedia.org/wiki/Normalized_compression_distanc...</a>
If you're also getting a 500:<p><a href="https://web.archive.org/web/20250106075631/https://nickfa.ro/wiki/OCRing_Music_from_YouTube_with_Common_Lisp" rel="nofollow">https://web.archive.org/web/20250106075631/https://nickfa.ro...</a>
To OCR music scores, see e.g., <a href="https://digitalcollection.zhaw.ch/items/276365b9-0a20-4286-af62-060d70a04402" rel="nofollow">https://digitalcollection.zhaw.ch/items/276365b9-0a20-4286-a...</a>