TechEcho

10 comments

spoluover 10 years ago

I realize now that I could have given the basic principles behind each of the two libraries compared here:libPuzzle: Splits the image in blocks and compute the hash based on the relationships between the adjacent blocks brightness.pHash: Computes the 8x8 DCT (<a href="http://en.wikipedia.org/wiki/Discrete_cosine_transform" rel="nofollow">http://en.wikipedia.org/wiki/Discrete_cosine_transform</a>) representation of an image (lowest frequencies of the image). It then sets the hash by comparing each of these 8×8 values to the mean DCT value (very resilient to non-structural changes in the image).I updated the post with these informations

jo_over 10 years ago

I just finished writing about distance hashing functions with a slightly different angle. I visualized the distances between a bunch of images using two different techniques, one of which was pHash (discussed in the parent article). Mine isn't quite as in-depth performance wise, but it makes for pretty pictures. Some of my work is here: <a href="http://www.josephcatrambone.com/?p=619" rel="nofollow">http://www.josephcatrambone.com/?p=619</a>I'm going to upload the SHA distance tonight.

0x09over 10 years ago

libpHash is actually quite slow for what it does. I spent a fair amount of time investigating image hashing algorithms a few years ago and at that time I saw 10-20x improvement over libpHash just by implementing the similar phash algorithm described in Neil K's blog.* With Puzzle being both slower and dramatically less accurate on my body of test images. Perceptual hashing can be surprisingly lightweight -- by the end of the experiment I was really just benchmarking image loading libraries. If speed is a concern you are probably better off foregoing these libs and writing the 2-3 dozen lines of code (really!) it takes to roll your own, or better yet implementing a comparable, even more lightweight algorithm like dhash.* <a href="http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html" rel="nofollow">http://www.hackerfactor.com/blog/index.php?/archives/529-Kin...</a>* The main difference being that libpHash applies a gaussian blur over the image, which can be made redundant by using a decent resampling algorithm.

drewcumminsover 10 years ago

I wrote this after spending way too much time on this problem earlier this year--nothing new, but is a fair chronology of my approach.<a href="http://download.picturelife.com.s3.amazonaws.com/press-kits/ImageSimilarityWhirlwind.pdf" rel="nofollow">http://download.picturelife.com.s3.amazonaws.com/press-kits/...</a>

评论 #8518014 未加载

phoboslabover 10 years ago

We've been using phash for an image board for a while now and are quite happy with it. We only use it to detect reposts when someone uploads an image. It gives some false positives quite often, but that's totally okay for our use case. We specifically set it up to err on the safe side. Users are only presented with a "Are you sure your upload is not a duplicate?" message.Currently we're just doing a `WHERE BIT_COUNT(images.phash ^ inputHash) < 12` in MySQL over 400k rows, which still works reasonably well (~200ms) given that it can't use an index for the XOR/BIT_COUNT operation. To my knowledge there's no way to speed up this query in MySQL, so if we continue to grow we probably have to write a small daemon that is able to search hashes more efficiently.

评论 #8519064 未加载

评论 #8518262 未加载

albertzeyerover 10 years ago

Some relevant interesting StackOverflow questions:<a href="http://stackoverflow.com/questions/4196453/simple-and-fast-method-to-compare-images-for-similarity" rel="nofollow">http://stackoverflow.com/questions/4196453/simple-and-fast-m...</a> (my own :)) <a href="http://stackoverflow.com/questions/75891/algorithm-for-finding-similar-images" rel="nofollow">http://stackoverflow.com/questions/75891/algorithm-for-findi...</a> <a href="http://stackoverflow.com/questions/596262/image-fingerprint-to-compare-similarity-of-many-images" rel="nofollow">http://stackoverflow.com/questions/596262/image-fingerprint-...</a>

quarterwaveover 10 years ago

Is an image hash a very different beast from the hashing function used in passwords etc? In the latter we want large sensitivity to small changes, while in an image hash we want a measure sensitive to similarities.Naively I'd expect image hashing to be like cross-correlation (non-linear) while password hashing can be done with shifts and modulo-2 (linear).

jimktrains2over 10 years ago

Be sure to enable javascript or you won't be able to see the gist with the results. (Obviously including a plain table in the post would be too much work:-\)

评论 #8516824 未加载

评论 #8516841 未加载

luminatiover 10 years ago

The Phash library is GPL licensed. If you are building a closed source commercial product, you need to purchase a license.

评论 #8517453 未加载

评论 #8517202 未加载

GFK_of_xmaspastover 10 years ago

Isn't SURF patent-encumbered?

评论 #8518338 未加载

10 comments

spoluover 10 years ago

jo_over 10 years ago

0x09over 10 years ago

drewcumminsover 10 years ago

评论 #8518014 未加载

phoboslabover 10 years ago

评论 #8519064 未加载

评论 #8518262 未加载

albertzeyerover 10 years ago

quarterwaveover 10 years ago

jimktrains2over 10 years ago

Be sure to enable javascript or you won't be able to see the gist with the results. (Obviously including a plain table in the post would be too much work:-\)

评论 #8516824 未加载

评论 #8516841 未加载

luminatiover 10 years ago

The Phash library is GPL licensed. If you are building a closed source commercial product, you need to purchase a license.

A Comparison of Image Hashing Libraries

10 comments

A Comparison of Image Hashing Libraries

10 comments