TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Inverting PhotoDNA

131 pointsby anishathalyeover 3 years ago

7 comments

anishathalyeover 3 years ago
A bit of context: Microsoft developed PhotoDNA to identify illegal images like CSAM -- NCMEC maintains a database of PhotoDNA signatures, and many companies use this service to identify and remove these images.<p>Microsoft claims:<p>&gt; A PhotoDNA hash is not reversible, and therefore cannot be used to recreate an image.<p>This project shows that this isn&#x27;t quite true: machine learning can do a pretty good job of reproducing a thumbnail-quality images from a PhotoDNA signature.<p>There has been some discussion in the past on HN about PhotoDNA: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=28378254" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=28378254</a>. It has been claimed that PhotoDNA is reversible, but there was no public demonstration as far as I know.
评论 #29635928 未加载
评论 #29637491 未加载
st_goliathover 3 years ago
On a side note, I find it kind of funny how, when using the model trained on Reddit, some of the outputs contain a quite readable &quot;The image you are requesting does not exist or is no longer available&quot; text, and a faint &quot;imgur.com&quot; watermark in the lower left corner.<p>For the former, I guess when training the original model, a bunch of the Reddit images weren&#x27;t available at crawl time. Wouldn&#x27;t it make sense to somehow weed those out from the data set before the training?
评论 #29636673 未加载
pornelover 3 years ago
I&#x27;d say that the project <i>confirms</i> that PhotoDNA is not reversible.<p>This project generates discolored deformed thumbnails with maybe 12 pixels of resolution, and that&#x27;s after addition of synthesized&#x2F;imaginary data into them. Without priming by looking at the ground truth image, any attempts to guess what was in the images is just a Rorschach test.
评论 #29638685 未加载
causiover 3 years ago
I&#x27;m not a mathematician, but isn&#x27;t there a direct correlation between reversibility and the unlikelihood of collisions? That is, if you have few to no collisions in the entire dataset of human-created images, it must be technically possible to reverse the hash into a reasonable thumbnail?
评论 #29636333 未加载
somebodythereover 3 years ago
The requirement that changing the image a little bit changes the hash a little bit makes the image space smooth and more suitable for machine learning.
评论 #29637718 未加载
jrm4over 3 years ago
Seems like false advertising to even call it a &quot;hash&quot; at this point? If meaningful data can be regained, it ain&#x27;t a hash.
评论 #29637643 未加载
评论 #29637576 未加载
评论 #29637729 未加载
rbanffyover 3 years ago
I wonder if, with a couple million passwords and their salted hashes, we can reconstruct something similar to the original password and reduce the search space somewhat.<p>I know it <i>should not</i> be possible, but, still, I’d love to play with that kind of dataset.
评论 #29637695 未加载