TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

A Comparison of Image Hashing Libraries

82 pointsby spoluover 10 years ago

10 comments

spoluover 10 years ago
I realize now that I could have given the basic principles behind each of the two libraries compared here:<p>libPuzzle: Splits the image in blocks and compute the hash based on the relationships between the adjacent blocks brightness.<p>pHash: Computes the 8x8 DCT (<a href="http://en.wikipedia.org/wiki/Discrete_cosine_transform" rel="nofollow">http:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Discrete_cosine_transform</a>) representation of an image (lowest frequencies of the image). It then sets the hash by comparing each of these 8×8 values to the mean DCT value (very resilient to non-structural changes in the image).<p>I updated the post with these informations
jo_over 10 years ago
I just finished writing about distance hashing functions with a slightly different angle. I visualized the distances between a bunch of images using two different techniques, one of which was pHash (discussed in the parent article). Mine isn&#x27;t quite as in-depth performance wise, but it makes for pretty pictures. Some of my work is here: <a href="http://www.josephcatrambone.com/?p=619" rel="nofollow">http:&#x2F;&#x2F;www.josephcatrambone.com&#x2F;?p=619</a><p>I&#x27;m going to upload the SHA distance tonight.
0x09over 10 years ago
libpHash is actually quite slow for what it does. I spent a fair amount of time investigating image hashing algorithms a few years ago and at that time I saw 10-20x improvement over libpHash just by implementing the similar phash algorithm described in Neil K&#x27;s blog.* With Puzzle being both slower and dramatically less accurate on my body of test images. Perceptual hashing can be surprisingly lightweight -- by the end of the experiment I was really just benchmarking image loading libraries. If speed is a concern you are probably better off foregoing these libs and writing the 2-3 dozen lines of code (really!) it takes to roll your own, or better yet implementing a comparable, even more lightweight algorithm like dhash.<p>* <a href="http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html" rel="nofollow">http:&#x2F;&#x2F;www.hackerfactor.com&#x2F;blog&#x2F;index.php?&#x2F;archives&#x2F;529-Kin...</a><p>* The main difference being that libpHash applies a gaussian blur over the image, which can be made redundant by using a decent resampling algorithm.
drewcumminsover 10 years ago
I wrote this after spending way too much time on this problem earlier this year--nothing new, but is a fair chronology of my approach.<p><a href="http://download.picturelife.com.s3.amazonaws.com/press-kits/ImageSimilarityWhirlwind.pdf" rel="nofollow">http:&#x2F;&#x2F;download.picturelife.com.s3.amazonaws.com&#x2F;press-kits&#x2F;...</a>
评论 #8518014 未加载
phoboslabover 10 years ago
We&#x27;ve been using phash for an image board for a while now and are quite happy with it. We only use it to detect reposts when someone uploads an image. It gives some false positives quite often, but that&#x27;s totally okay for our use case. We specifically set it up to err on the safe side. Users are only presented with a &quot;Are you sure your upload is not a duplicate?&quot; message.<p>Currently we&#x27;re just doing a `WHERE BIT_COUNT(images.phash ^ inputHash) &lt; 12` in MySQL over 400k rows, which still works reasonably well (~200ms) given that it can&#x27;t use an index for the XOR&#x2F;BIT_COUNT operation. To my knowledge there&#x27;s no way to speed up this query in MySQL, so if we continue to grow we probably have to write a small daemon that is able to search hashes more efficiently.
评论 #8519064 未加载
评论 #8518262 未加载
albertzeyerover 10 years ago
Some relevant interesting StackOverflow questions:<p><a href="http://stackoverflow.com/questions/4196453/simple-and-fast-method-to-compare-images-for-similarity" rel="nofollow">http:&#x2F;&#x2F;stackoverflow.com&#x2F;questions&#x2F;4196453&#x2F;simple-and-fast-m...</a> (my own :)) <a href="http://stackoverflow.com/questions/75891/algorithm-for-finding-similar-images" rel="nofollow">http:&#x2F;&#x2F;stackoverflow.com&#x2F;questions&#x2F;75891&#x2F;algorithm-for-findi...</a> <a href="http://stackoverflow.com/questions/596262/image-fingerprint-to-compare-similarity-of-many-images" rel="nofollow">http:&#x2F;&#x2F;stackoverflow.com&#x2F;questions&#x2F;596262&#x2F;image-fingerprint-...</a>
quarterwaveover 10 years ago
Is an image hash a very different beast from the hashing function used in passwords etc? In the latter we want large sensitivity to small changes, while in an image hash we want a measure sensitive to similarities.<p>Naively I&#x27;d expect image hashing to be like cross-correlation (non-linear) while password hashing can be done with shifts and modulo-2 (linear).
jimktrains2over 10 years ago
Be sure to enable javascript or you won&#x27;t be able to see the gist with the results. (Obviously including a plain table in the post would be too much work:-\)
评论 #8516824 未加载
评论 #8516841 未加载
luminatiover 10 years ago
The Phash library is GPL licensed. If you are building a closed source commercial product, you need to purchase a license.
评论 #8517453 未加载
评论 #8517202 未加载
GFK_of_xmaspastover 10 years ago
Isn&#x27;t SURF patent-encumbered?
评论 #8518338 未加载