Great write-up! We did something very similar when trying to find duplicate product images for a consumer review site we were working on. Our implementation desaturated the image, broke it into a fixed number of tiles, and generated a histogram of each tile's values. Then we put a threshold on the histogram values, and had each value in the tile represent a bit. Combine the bits, and we had a hash to store in the DB. Our hashes were a bit larger, so images within a certain hamming distance were flagged, rather than just looking for exact hash matches. It took quite a bit of tuning for us to get good results, but it seemed to work pretty well. Do you see many false positives with such a small processed image size (the 9x8 one, I mean)?