There's also nilsimsa hashing (there's a Python implementation at <a href="http://code.google.com/p/py-nilsimsa/" rel="nofollow">http://code.google.com/p/py-nilsimsa/</a>). Unfortunately, nilsimsa hashes can vary in their most significant bits when used on similar inputs:<p><pre><code> 773e2df0a02a319ec34a0b71d54029111da90838cbc20ecd3d2d4e18c25a3025
47182cf0802a11dec24a3b75d5042d310ca90838c9d20ecc3d610e98560a3645
</code></pre>
...so although nilsimsa is somewhat nice for calculating the difference of two documents, it's a pain in the butt for finding similar documents in a database.<p>The solution described in the writeup is neat, but I really wish there was a LSH that generated hashes with a most-to-least significance in their bits.<p>Great writeup though!