科技回声

7 条评论

A bit of context: Microsoft developed PhotoDNA to identify illegal images like CSAM -- NCMEC maintains a database of PhotoDNA signatures, and many companies use this service to identify and remove these images.Microsoft claims:> A PhotoDNA hash is not reversible, and therefore cannot be used to recreate an image.This project shows that this isn't quite true: machine learning can do a pretty good job of reproducing a thumbnail-quality images from a PhotoDNA signature.There has been some discussion in the past on HN about PhotoDNA: <a href="https://news.ycombinator.com/item?id=28378254" rel="nofollow">https://news.ycombinator.com/item?id=28378254</a>. It has been claimed that PhotoDNA is reversible, but there was no public demonstration as far as I know.

评论 #29635928 未加载

评论 #29637491 未加载

st_goliath超过 3 年前

On a side note, I find it kind of funny how, when using the model trained on Reddit, some of the outputs contain a quite readable "The image you are requesting does not exist or is no longer available" text, and a faint "imgur.com" watermark in the lower left corner.For the former, I guess when training the original model, a bunch of the Reddit images weren't available at crawl time. Wouldn't it make sense to somehow weed those out from the data set before the training?

评论 #29636673 未加载

pornel超过 3 年前

I'd say that the project confirms that PhotoDNA is not reversible.This project generates discolored deformed thumbnails with maybe 12 pixels of resolution, and that's after addition of synthesized/imaginary data into them. Without priming by looking at the ground truth image, any attempts to guess what was in the images is just a Rorschach test.

评论 #29638685 未加载

causi超过 3 年前

I'm not a mathematician, but isn't there a direct correlation between reversibility and the unlikelihood of collisions? That is, if you have few to no collisions in the entire dataset of human-created images, it must be technically possible to reverse the hash into a reasonable thumbnail?

评论 #29636333 未加载

somebodythere超过 3 年前

The requirement that changing the image a little bit changes the hash a little bit makes the image space smooth and more suitable for machine learning.

评论 #29637718 未加载

jrm4超过 3 年前

Seems like false advertising to even call it a "hash" at this point? If meaningful data can be regained, it ain't a hash.

评论 #29637643 未加载

评论 #29637576 未加载

评论 #29637729 未加载

rbanffy超过 3 年前

I wonder if, with a couple million passwords and their salted hashes, we can reconstruct something similar to the original password and reduce the search space somewhat.I know it should not be possible, but, still, I’d love to play with that kind of dataset.

评论 #29637695 未加载

7 条评论

anishathalye超过 3 年前

评论 #29635928 未加载

评论 #29637491 未加载

st_goliath超过 3 年前

评论 #29636673 未加载

pornel超过 3 年前

评论 #29638685 未加载

causi超过 3 年前

评论 #29636333 未加载

somebodythere超过 3 年前

The requirement that changing the image a little bit changes the hash a little bit makes the image space smooth and more suitable for machine learning.

评论 #29637718 未加载

jrm4超过 3 年前

Seems like false advertising to even call it a "hash" at this point? If meaningful data can be regained, it ain't a hash.

Inverting PhotoDNA

7 条评论

Inverting PhotoDNA

7 条评论