TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The Problem with Perceptual Hashes

705 pointsby rivoalmost 4 years ago

44 comments

ezoealmost 4 years ago
The problem of hash or NN based matching is, the authority can avoid explaining the mismatch.<p>Suppose the authority want to false-arrest you. They prepare a hash that matches to an innocent image they knew the target has in his Apple product. They hand that hash to the Apple, claiming it&#x27;s a hash from a child abuse image and demand privacy-invasive searching for the greater good.<p>Then, Apple report you have a file that match the hash to the authority. The authority use that report for a convenient reason to false-arrest you.<p>Now what happens if you sue the authority for the intentional false-arrest? Demand the original intended file for the hash? &quot;No. We won&#x27;t reveal the original file because it&#x27;s child abusing image, also we don&#x27;t keep the original file for moral reason&quot;<p>But come to think of it, we already have tons of such bogus pseudo-science technology like the dogs which conveniently bark at police&#x27;s secret hand sign, polygraph, and the drug test kit which detect illegal drugs from thin air.
评论 #28097326 未加载
评论 #28095129 未加载
评论 #28095220 未加载
评论 #28100662 未加载
评论 #28098657 未加载
评论 #28100970 未加载
评论 #28107391 未加载
评论 #28101480 未加载
评论 #28098758 未加载
评论 #28096552 未加载
marcinzmalmost 4 years ago
Given all the zero day exploits on iOS I wonder if it&#x27;s now going to be viable to hack someone&#x27;s phone and upload child porn to their account. Apple with happily flag the photos and then, likely, get those people arrested. Now they have to, in practice, prove they were hacked which might be impossible. Will either ruin their reputation or put them in jail for a long time. Given past witch hunts it could be decades before people get exonerated.
评论 #28094834 未加载
评论 #28093794 未加载
评论 #28094768 未加载
评论 #28093238 未加载
评论 #28093082 未加载
评论 #28094031 未加载
评论 #28096520 未加载
avnigoalmost 4 years ago
&gt; These cases will be manually reviewed. That is, according to Apple, an Apple employee will then look at your (flagged) pictures.<p>I&#x27;m surprised this hasn&#x27;t gotten enough traction outside of tech news media.<p>Remember the mass celebrity &quot;hacking&quot; of iCloud accounts a few years ago? I wonder how those celebrities would feel knowing that some of their photos may be falsely flagged and shown to other people. And that we expect those humans to act like robots and not sell or leak the photos, etc.<p>Again, I&#x27;m surprised we haven&#x27;t seen a far bigger outcry in the general news media about this yet, but I&#x27;m glad to see a lot of articles shining light on how easy it is for false positives and hash collisions to occur, especially at the scale of all iCloud photos.
评论 #28097333 未加载
评论 #28099604 未加载
评论 #28098652 未加载
at_a_removealmost 4 years ago
I do not know as much about perceptual hashing as I would like, but have considered it for a little project of my own.<p>Still, I know it has been floating around in the wild. I recently came across it on Discord when I attempted to push an ancient image, from the 4chan of old, to a friend, which mysteriously wouldn&#x27;t send. Saved it as a PNG, no dice. This got me interested. I stripped the EXIF data off of the original JPEG. I resized it slightly. I trimmed some edges. I adjusted colors. I did a one degree rotation. Only after a reasonably complete combination of those factors would the image make it through. How interesting!<p>I just don&#x27;t know how well this little venture of Apple&#x27;s will scale, and I wonder if it won&#x27;t even up being easy enough to bypass in a variety of ways. I think the tradeoff will do very little, as stated, but is probably a glorious apportunity for black-suited goons of state agencies across the globe.<p>We&#x27;re going to find out in a big big way soon.<p>* The image is of the back half of a Sphynx cat atop a CRT. From the angle of the dangle, the presumably cold, man-made feline is draping his unexpectedly large testicles across the similarly man-made device to warm them, suggesting that people create problems and also their solutions, or that, in the Gibsonian sense, the street finds its own uses for things. I assume that the image was blacklisted, although I will allow for the somewhat baffling concept of a highly-specialized scrotal matching neural-net that overreached a bit or a byte on species, genus, family, and order.
评论 #28092718 未加载
评论 #28093171 未加载
评论 #28096689 未加载
mrtksnalmost 4 years ago
The technical challenges aside, I’m very disturbed that my device will be reporting me to the authorities.<p>That’s very different from authorities taking a sneak peek into my stuff.<p>That’s like the theological concept of always being watched.<p>It starts with child pornography but the technology is indifferent towards it, it can be anything.<p>It’s always about the children because we all want to save the children. Soon they will start asking you start saving your country. Depending on your location they will start checking against sins against religion, race, family values, political activities.<p>I bet you, after the next election in the US your device will be reporting you for spreading far right or deep state lies, depending on who wins.<p>I’m big Apple fanboy, but I’m not going to carry a snitch in my pocket. That’s “U2 Album in everyone’s iTunes library” blunder level creepy with the only difference that it’s actually truly creepy.<p>In my case, my iPhone is going to be snitching me to Boris and Erdogan, in your case it could be Macron, Bolsonaro, Biden, Trump etc.<p>That’s no go for me, you can decide for yourself.
评论 #28094149 未加载
评论 #28095248 未加载
评论 #28094430 未加载
评论 #28094823 未加载
评论 #28093629 未加载
评论 #28094622 未加载
yellow_leadalmost 4 years ago
Regarding false positives re:Apple, the Ars Technica article claims<p>&gt; Apple offers technical details, claims 1-in-1 trillion chance of false positives.<p>There are two ways to read this, but I&#x27;m assuming it means, for each scan, there is a 1-in-1 trillion chance of a false positive.<p>Apple has over 1 billion devices. Assuming ten scans per device per day, you would reach one trillion scans in ~100 days. Okay, but not all the devices will be on the latest iOS, not all are active, etc, etc. But this is all under the assumption those numbers are accurate. I imagine reality will be much worse. And I don&#x27;t think the police will be very understanding. Maybe you will get off, but you&#x27;ll be in a huge debt from your legal defense. Or maybe, you&#x27;ll be in jail, because the police threw the book at you.
评论 #28094127 未加载
评论 #28095460 未加载
评论 #28095527 未加载
评论 #28093808 未加载
评论 #28094917 未加载
stickfigurealmost 4 years ago
I&#x27;ve also implemented perceptual hashing algorithms for use in the real world. Article is correct, there really is no way to eliminate false positives while still catching minor changes (say, resizing, cropping, or watermarking).<p>I&#x27;m sure I&#x27;m not the only person with naked pictures of my wife. Do you really want a false positive to result in your intimate moments getting shared around some outsourced boiler room for laughs?
评论 #28094207 未加载
评论 #28093027 未加载
评论 #28092856 未加载
评论 #28093260 未加载
评论 #28097208 未加载
评论 #28093500 未加载
评论 #28093129 未加载
评论 #28093096 未加载
评论 #28093220 未加载
karmakazealmost 4 years ago
It really all comes down to if Apple has and is willing to maintain the effort of human evaluations prior to taking action on the potentially false positives:<p>&gt; According to Apple, a low number of positives (false or not) will not trigger an account to be flagged. But again, at these numbers, I believe you will still get too many situations where an account has multiple photos triggered as a false positive. (Apple says that probability is “1 in 1 trillion” but it is unclear how they arrived at such an estimate.) These cases will be manually reviewed.<p>At scale, even human classification which ought to be clear will fail, accidentally clicking &#x27;not ok&#x27; when they saw something they thought was &#x27;ok&#x27;. It will be interesting to see what happens then.
评论 #28093476 未加载
rustyboltalmost 4 years ago
&gt; an Apple employee will then look at your (flagged) pictures.<p>This means that there will be people paid to look at child pornography and probably a lot of private nude pictures as well.
评论 #28092876 未加载
评论 #28094579 未加载
评论 #28093091 未加载
评论 #28093114 未加载
评论 #28092863 未加载
评论 #28093151 未加载
sisciaalmost 4 years ago
What I am missing from all this story, is what triggered Apple to put in place, or even think about, this system.<p>It is clearly a no-trivial project, no other company is doing it, and it will be one of the rare case of a company doing something not for shareholders value but for &quot;goodwill&quot;.<p>I am really not understanding the reasoning behind this choice.
评论 #28099158 未加载
评论 #28098844 未加载
评论 #28098910 未加载
BiteCode_devalmost 4 years ago
The problem is not perceptual hashes. The problem is the back door. Let&#x27;s not focus on the defect of the train leading you to the concentration camp. The problem is that there is a camp at the end of the rail road.
klodolphalmost 4 years ago
&gt; Even at a Hamming Distance threshold of 0, that is, when both hashes are identical, I don’t see how Apple can avoid tons of collisions...<p>You&#x27;d want to look at the particular perceptual hash implementation. There is no reason to expect, without knowing the hash function, that you would end up with tons of collisions at distance 0.
评论 #28094640 未加载
drzoltaralmost 4 years ago
The other issue with these hashes is non-robustness to adversarial attacks. Simply rotating the image by a few degrees, or slightly translating&#x2F;shearing it will move the hash well outside the threshold. The only way to combat this would be to use a face bounding box algorithm to somehow manually realign the image.
评论 #28093490 未加载
Waterluvianalmost 4 years ago
I’m rather fascinated by the false matches. Those two images are very different and yet beautifully similar.<p>I want to see a lot more pairs like this!
starkdalmost 4 years ago
The method Apple is using looks more like a cryptographic hash. That&#x27;s entirely different (and more secure) than a perceptual hash.<p>From <a href="https:&#x2F;&#x2F;www.apple.com&#x2F;child-safety&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.apple.com&#x2F;child-safety&#x2F;</a><p>&quot;Before an image is stored in iCloud Photos, an on-device matching process is performed for that image against the known CSAM hashes. This matching process is powered by a cryptographic technology called private set intersection, which determines if there is a match without revealing the result. The device creates a cryptographic safety voucher that encodes the match result along with additional encrypted data about the image. This voucher is uploaded to iCloud Photos along with the image.&quot;<p>Elsewhere, it does explain the use of neuralhashes which I take to be the perceptual hash part of it.<p>I did some work on a similar attempt awhile back. I also have a way to store hashes and find similar images. Here&#x27;s my blog post. I&#x27;m currently working on a full site.<p><a href="http:&#x2F;&#x2F;starkdg.github.io&#x2F;posts&#x2F;concise-image-descriptor" rel="nofollow">http:&#x2F;&#x2F;starkdg.github.io&#x2F;posts&#x2F;concise-image-descriptor</a>
评论 #28095092 未加载
评论 #28094157 未加载
评论 #28095557 未加载
jiggawattsalmost 4 years ago
The world in the 1900s:<p>Librarians: &quot;It is unthinkable that we would ever share a patron&#x27;s borrowing history!&quot;<p>Post office employees: &quot;Letters are private, only those commie countries open the mail their citizens send!&quot;<p>Police officers: &quot;A search warrant from a Judge or probable cause is required before we can search a premises or tap a single, specific phone line!&quot;<p>The census: &quot;Do you agree to share the full details of your record after 99 years have elapsed?&quot;<p>The world in the 2000s:<p>FAANGs: &quot;We know <i>everything</i> about you. Where you go. What you buy. What you read. What you say and to whom. <i>What specific type of taboo pornography you prefer.</i> We&#x27;ll happily share it with used car salesmen and the hucksters that sell WiFi radiation blockers and healing magnets. Also: Cambridge Analytica, the government, foreign governments, and anyone who asks and can pony up the cash, really. Shh now, I have a quarterly earnings report to finish.&quot;<p>Device manufacturers: &quot;We&#x27;ll rifle through your photos on a weekly basis, just to see if you&#x27;ve got some banned propaganda. Did I say propaganda? I meant child porn, that&#x27;s harder to argue with. The algorithm is the same though, and just how the Australian government put uncomfortable information leaks onto the banned CP list, so will your government. No, you can&#x27;t check the list! You&#x27;ll have to just trust us.&quot;<p>Search engines: &quot;Tiananmen Square is located in Beijing China. Here&#x27;s a cute tourist photo. No further information available.&quot;<p>Online Maps: &quot;Tibet (China). Soon: Taiwan (China).&quot;<p>Media distributors: &quot;We&#x27;ll go into your home, rifle through your albums, and take the ones we&#x27;ve stopped selling. Oh, not <i>physically</i> of course. No-no-no-no, nothing so barbaric! We&#x27;ll simply remotely instruct your device to delete anything we no longer want you to watch or listen to. Even if you bought it from somewhere else and uploaded it yourself. It <i>matches a hash</i>, you see? It&#x27;s got to go!&quot;<p>Governments: &quot;Scan a barcode so that we can keep a record of your every movement, for public health reasons. Sure, Google and Apple developed a secure, privacy-preserving method to track exposures. We prefer to use our method instead. Did we forget to mention the data retention period? Don&#x27;t worry about that. Just assume... indefinite.&quot;
评论 #28096390 未加载
asimpletunealmost 4 years ago
“ Even at a Hamming Distance threshold of 0, that is, when both hashes are identical, I don’t see how Apple can avoid tons of collisions, given the large number of pictures taken every year (1.4 trillion in 2021, now break this down by iPhone market share and country, the number for US iPhone users will still be extremely big).”<p>Is this true? I’d imagine you could generate billions a second without having a collision, although I don’t know much about how these hashes are produced.<p>It would be cool for an expert to weigh in here.
Wowfunhappyalmost 4 years ago
&gt; At my company, we use “perceptual hashes” to find copies of an image where each copy has been slightly altered.<p>Kind of off topic, does anyone happen to know of some good software for doing this on a local collection of images? A common sequence of events at my company:<p>1. We&#x27;re designing a website for some client. They send us a collection of a zillion photos to pull from. For the page about elephants, we select the perfect elephant photo, which we crop, <i>lightly</i> recolor, compress, and upload.<p>2. Ten years later, this client sends us a screenshot of the elephant page, and asks if we still have a copy of the original photo.<p>Obviously, absolutely no one at this point remembers the name of the original photo, and we need to either spend hours searching for it or (depending on our current relationship) nicely explain that we can&#x27;t help. It would be really great if we could do something like a reverse Google image search, but for a local collection. I know it&#x27;s possible to license e.g. TinEye, but it&#x27;s not practical for us as a tiny company. What I really want is an open source solution I can set up myself.<p>We used Digicam for a while, and there were a couple of times it was useful. However, for whatever reason it seemed to be extremely crash-prone, and it frequently couldn&#x27;t find things it really should have been able to find.
评论 #28099792 未加载
brian_hermanalmost 4 years ago
Fortunately I have a cisco router and enough knowledge to block the 17.0.0.0&#x2F;8 ip address range. This combined with an openvpn vpn will block all apple services from my devices. So basically my internet will look like this:<p>Internet &lt;---&gt; CISCO &lt;---&gt; ASUS ROUTER with openvpn &lt;-&gt; Network The cisco router will block the 17.0.0.0&#x2F;8 ip address range and I will use spotify on all my computers.
评论 #28093910 未加载
评论 #28096856 未加载
lancemurdockalmost 4 years ago
I am going to give this lineageOS on an android device a shot. This is one of the most egregious things Apple has ever done
read_if_gay_almost 4 years ago
Big tech has been disintegrating the foundational principles on which our society is built in the name of our society. Every one of their moves is a deeper attack on personal freedom than the last. They need to be dealt with. Stop using their services, buying their products, defending them when they silence people.
jbmsfalmost 4 years ago
I am fairly ignorant if this space. Do any of the standard methods use multiple hash functions vs just one?
评论 #28092754 未加载
评论 #28093502 未加载
alkonautalmost 4 years ago
The key here is scale. If the only trigger for action is having (say) <i>a few hundred</i> matching images, or a dozen from the same known set of offending pictures, then I can see how apples “one in a trillion” claim would work.<p>Also, Apple could ignore images from the device camera - since those will never match.<p>This is also in stark contrast to the task faced by photo copyright hunters. They don’t have the luxury of only focusing on those who handle tens of thousands of copyrighted photos. They need to find individual violations because that’s what they are paid to do.
altitudinousalmost 4 years ago
This article focusses too much on the individual case, and not enough on the fact that Apple will need multiple matches to report someone. Images would normally be distributed in sets I suspect, so it is going to be easy to detect when someone is holding an offending set because of multiple matches. I don&#x27;t think Apple are going to be concerned with a single hit. Here in the news offenders are reported as holding many thousands of images.
评论 #28095955 未加载
JacobiXalmost 4 years ago
Given that Apple technology uses NN and triplet embedding loss, the exact same techniques used by neural networks for face recognition, so maybe the same shortcomings would apply here. For example a team of researchers found a &#x27;Master Faces&#x27; that can bypass over 40% of Facial ID. Now suppose that you have such an image in your photo library, it would generate so many false positives …
SavantIdiotalmost 4 years ago
This article covers three methods, all of which just look for alterations of a source image to find a fast match (in fact, that&#x27;s the paper referenced). It is still a &quot;squint to see if it is similar&quot; test. I was under the impression there were more sophisticated methods that looked for <i>types</i> of images, not just altered known images. Am I misunderstanding?
评论 #28095060 未加载
chucklenorrisalmost 4 years ago
So, if there&#x27;s code on the device that&#x27;s computing these hashes then it can be extracted. Afterwards it should be possible to add changes to a inocent picture to make it produce a target hash. Getting a hash should pe possible too, just find a known pedo image and run the extracted algorithm. It&#x27;s only a matter of time until someone makes this
cratermoonalmost 4 years ago
If I&#x27;m reading this right? Apple is saying they are going to flag CSAM they find on their servers. This article talks about finding a match for photos by comparing a hash of a photo you&#x27;re testing with a hash you have, from a photo you have.<p>Does this mean Apple had&#x2F;has CSAM available to generate the hashes?
评论 #28096866 未加载
评论 #28095834 未加载
ngneeralmost 4 years ago
What is the ratio of consumers of child pornography to the population of iPhone users? In order of magnitude, is it 1%, 0.1%, 0.001%, 0.0001%? With all the press around the announcement, this is not exactly stealth technology. Wouldn&#x27;t such consumers switch platforms, rendering the system pointless?
评论 #28096899 未加载
risalmost 4 years ago
I agree with the article in general except part of the final conclusion<p>&gt; The simple fact that image data is reduced to a small number of bits leads to collisions and therefore false positives<p>Our experience with regular hashes suggests this is not the underlying problem. SHA256 hashes have 256 bits and still there are <i>no known</i> collisions, even with people deliberately trying to find them. SHA-1 only has only 160 bits to play with and it&#x27;s still hard enough to find collisions. MD5 is easier to find collisions but at 128 bits, still people don&#x27;t come across them by chance.<p>I think the actual issue is that perceptual hashes tend to be used with this &quot;nearest neighbour&quot; comparison scheme which is clearly needed to compensate for the inexactness of the whole problem.
评论 #28094100 未加载
评论 #28094907 未加载
btheshoealmost 4 years ago
I&#x27;m not insane in thinking this stuff has to be super vulnerable to adversarial attacks, right? And it&#x27;s not like adversarial attacks are a solved problem or anything.
评论 #28097511 未加载
评论 #28096826 未加载
chucklenorrisalmost 4 years ago
This technology is a godsend for the government to catch wistleblowers before they&#x27;re able to leak information. You wouldn&#x27;t even hear about those poor souls.
lliamanderalmost 4 years ago
What about genuine duplicate photos? Say there is a stock picture of a landscape, and someone else goes and takes their own picture of the same landscape?
kazinatoralmost 4 years ago
Perceptual hashing was invented by the Chinese: four-corner code character lookup, that lumps together characters with similar features.
legulerealmost 4 years ago
Which photos does Apple scan? Also of emails and messages? Could you swat somebody by sending them benign images that have the same hash?
madmax96almost 4 years ago
Why not make it so that I can see flagged images in my library? It would give me a lot more confidence that my photos stay private.
acidioxidealmost 4 years ago
It&#x27;s really disturbing that, in case of doubt, real person would check photos. That&#x27;s a red flag.
bastawhizalmost 4 years ago
Correct me if I&#x27;m wrong, but nowhere in Apple&#x27;s announcement do they mention &quot;perceptual&quot; hashing. I&#x27;ve searched through some of the PDFs they link as well, but those also don&#x27;t seem to mention the word &quot;perceptual&quot;. Can someone point out exactly where this is mentioned?
评论 #28095526 未加载
ChrisMarshallNYalmost 4 years ago
That’s a really useful explanation.<p>Thanks!
marcinzmalmost 4 years ago
&gt; an Apple employee will then look at your (flagged) pictures.<p>Always fun when unknown strangers get to look at your potentially sensitive photos with probably no notice given to you.
评论 #28092986 未加载
lordnachoalmost 4 years ago
Why wouldn&#x27;t the algo check that one image has a face while the other doesn&#x27;t? That would remove this particular false positive, though I&#x27;m not sure what it might cause of new ones.
评论 #28092729 未加载
ivalmalmost 4 years ago
I am not exactly buying the premise here, if you train a CNN on useful semantic categories then the representations they generate will be semantically meaningful (so the error shown in blog wouldn’t occur).<p>I dislike the general idea of iCloud having back doors but I don’t think the criticism in this blog is entirely valid.<p>Edit: it was pointed out apple doesn’t have semantically meaningful classifier so the blog post’s criticism is valid.
评论 #28092891 未加载
评论 #28092762 未加载
IfOnlyYouKnewalmost 4 years ago
Apple’s documents said they require multiple hits before anything happens, as the article notes. They can (and have) adjusted that number to any desired balance of false positive to negatives.<p>How can they say it’s 1 in a trillion? You test the algorithm on a bunch of random negatives, see how many positives you get, and do one division and one multiplication. This isn’t rocket science.<p>So, while there are many arguments against this program, this isn’t it. It’s also somewhat strange to believe the idea of collisions in hashes of far smaller size than the images they are run on somehow escaped Apple and&#x2F;or really anyone mildly competent.
评论 #28094991 未加载
评论 #28093178 未加载
ttulalmost 4 years ago
Apple would not be so naive as to roll out a solution to child abuse images that has a high false positive rate. They do test things prior to release…
评论 #28093638 未加载
评论 #28092617 未加载
评论 #28093037 未加载