TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Perceptual Image Hashing

104 点作者 specular将近 11 年前

11 条评论

Jemaclus将近 11 年前
This made me think of something, so bear with me for a second. I took Andrew Ng&#x27;s Machine Learning class on Coursera, and one of the things he talked about was how, given a certain data set and a logistic regression approach, the algorithm might decide to just always give it a value of 1, since 98% of the time the value is one.<p>The examples given here were all predicted to have 95%+ similarity, but it seems to me that given this dataset, I could be reasonably convinced that it just assumes all images are reasonably similar. In other words, the author provides no counter-examples that demonstrate that the algorithm can detect <i>dissimilar</i> images, such as a photo of a car and a clipart of pizza. I would expect that to have very low similarity, and given the algorithm&#x27;s demonstrated ability to discern between similar and dissimilar images, I would be more convinced of perceptual image hashing as a viable technique.
评论 #8157445 未加载
评论 #8155236 未加载
bcoates将近 11 年前
I&#x27;ve used this technique before for image searching, here&#x27;s some tips:<p>The algorithm presented here can be summarized as &quot;Take the low-frequency DCT components of the luminance channel, quantized down to one bit&quot;. Hashing is a very misleading term for what you get from it: In particular, it doesn&#x27;t have anything like the uniform coverage of the output space you would expect from a hash function, as almost all encountered images exist in a very small range of hashes. Also, the bits in the hash are not all created equal; assuming you encode them in a zig-zag pattern truncation is equivalent to a low-pass filter, so if you want to make the hash different sizes while matching on the same image elements you&#x27;ll want to alter the quantization amount, not the output DCT size (this is directly equivalent to what JPEG does, which uses a fixed DCT size but lossily quantizes the information in it to varying degrees for different compression quality settings). Experiment with different quantization techniques; you don&#x27;t have to treat every bit in the output the same.<p>This technique is only suitable for a small subset of the image files that you will encounter in the real world: it&#x27;s mostly unsuitable for anything but photographs and photograph-like images (3d renders, paintings, etc.) Many classes of image have no useful distinguishing information in the low-frequency luminance space, such as screenshots, pictures of text, line drawings, many forms of graph and diagram, most &quot;clip art&quot; style drawings, pictures that include layout elements in the image file (like most of the meme&#x2F;sucessories&#x2F;demotivator format pictures you&#x27;ll see on imageboards). Of course, an image with a flat or heavily compressed luminance channel will yield no information at all.<p>You don&#x27;t have to do anything special to match flipped images, as you should be using (iirc) DCT-II that has the same output for a mirror image. It&#x27;s insensitive to scaling and mild translation.<p>For debugging purposes, note that the transformation performed is lossy but easily reversible: you can undo all the steps but quantization to generate a &#x27;stereotype&#x27; image of a particular hash value and its neighbors. There is no more information available to the matching algorithm than you can see with your own eyes with the round-trip conversion, so if the round-tripped image preserves important distinguishing detail the matching will work and if it turns the image into a meaningless grey smear you&#x27;ll get tons of false positive matches.
评论 #8157074 未加载
667将近 11 年前
Is it different from existing implementations? Am I missing something?<p><a href="http://phash.org/" rel="nofollow">http:&#x2F;&#x2F;phash.org&#x2F;</a><p><a href="http://hzqtc.github.io/2013/04/image-duplication-detection.html" rel="nofollow">http:&#x2F;&#x2F;hzqtc.github.io&#x2F;2013&#x2F;04&#x2F;image-duplication-detection.h...</a><p><a href="http://www.hackerfactor.com/blog/?/archives/432-Looks-Like-It.html" rel="nofollow">http:&#x2F;&#x2F;www.hackerfactor.com&#x2F;blog&#x2F;?&#x2F;archives&#x2F;432-Looks-Like-I...</a>
评论 #8155291 未加载
评论 #8155434 未加载
elzr将近 11 年前
Hash functions impose practical mappings between unlike domains. When volatile (~1-to-1) identifying (crypto), when correlated (~n-to-n) perceiving (AI).<p>(Tweet summary of the article. I&#x27;m becoming increasingly fascinated by hash functions. I&#x27;m finding all important this tension between abstraction&#x2F;correlation&#x2F;perception &amp; groundedness&#x2F;volatility&#x2F;identification.)<p><a href="https://twitter.com/elzr/status/497893639000190976" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;elzr&#x2F;status&#x2F;497893639000190976</a>
评论 #8155830 未加载
bravura将近 11 年前
Also look at Semantic Hashing, which uses deep learning to compress an input (e.g. a document) into a low dimensional bit vector with fast lookup:<p><a href="http://www.cs.toronto.edu/~rsalakhu/papers/semantic_final.pdf" rel="nofollow">http:&#x2F;&#x2F;www.cs.toronto.edu&#x2F;~rsalakhu&#x2F;papers&#x2F;semantic_final.pd...</a><p>The idea is that you could hash one billion objects, and do a very fast lookup of similar objects.<p>There is later academic work that refines this approach. For example, here is work applying it to image search: <a href="http://www.cs.utexas.edu/~grauman/temp/GraumanFergus_Hashing_chapterdraft.pdf" rel="nofollow">http:&#x2F;&#x2F;www.cs.utexas.edu&#x2F;~grauman&#x2F;temp&#x2F;GraumanFergus_Hashing...</a><p>You can consider this a variant of Locality-Sensitive Hashing, where the hash function is specifically induced through a machine learning technique.
daveloyall将近 11 年前
Does the term &quot;perceptual hashing&quot; only apply to the domain of images? What would you call this concept applied to text files? I am familiar with Levenshtein distance, but that family of algorithms provide a similarity score or value whereas I&#x27;d like a hash as the output.
评论 #8155504 未加载
评论 #8155415 未加载
评论 #8156353 未加载
TheLoneWolfling将近 11 年前
How would an adversary break this?<p>My first idea would be to rotate it by something that&#x27;s halfway between the steps used. Say, 360&#x2F;32 degrees. How does that compare?<p>Also: because it discards the high frequency data one should be able to construct something like this: <a href="http://cvcl.mit.edu/hybrid/CatDogHybrid.jpg" rel="nofollow">http:&#x2F;&#x2F;cvcl.mit.edu&#x2F;hybrid&#x2F;CatDogHybrid.jpg</a> - where to us at a close distance it looks like one thing but to this it looks like something else.
评论 #8156044 未加载
petercooper将近 11 年前
If this sort of thing fascinates you, here&#x27;s a related approach: <a href="http://grail.cs.washington.edu/projects/query/" rel="nofollow">http:&#x2F;&#x2F;grail.cs.washington.edu&#x2F;projects&#x2F;query&#x2F;</a> .. for some reason numerous people were working on image search features like this (i.e. draw what you want to find) 10 years ago but they don&#x27;t seem to have caught on other than with Google Images.
alceufc将近 11 年前
Very interesting approach.<p>I have worked with image feature extraction in the past. Although using DCT coefficients has been used as a way to analyze texture features, the idea idea of generating the hash (step 5) seems to be new.<p>I am curious however on why you are discarding color information. Usually for reverse image search this kind of information can be quite useful.
评论 #8155803 未加载
munificent将近 11 年前
This seems to work well for modifications that are basically in-place and high frequency, but how well does it handle things like one image being a crop of another, or translation, or flipping?
ktzar将近 11 年前
Hashes are supposed to change a lot with small changes in the data source. I wouldn&#x27;t call this hashing, but fingerprinting, and there&#x27;re plenty of fingerprinting algorithms for images, some related to steganography.
评论 #8155798 未加载