TechEcho

johnloeberabout 10 years ago

Semantic analysis of the Federalist Papers[0] comes to mind. It originally was not known which individual author wrote which paper, but stylometric analysis (i.e. word-counting and matching word frequency distributions of the unlabelled papers against those of labelled papers (in which the author was known)) made it reasonably straight-forward to identify the original authors.[0] A set of historical papers of great political importance. <a href="http://en.wikipedia.org/wiki/The_Federalist_Papers" rel="nofollow">http://en.wikipedia.org/wiki/The_Federalist_Papers</a>

alex_sfabout 10 years ago

Taxi data from NYC was deanonymized:<a href="https://medium.com/@vijayp/of-taxis-and-rainbows-f6bc289679a1" rel="nofollow">https://medium.com/@vijayp/of-taxis-and-rainbows-f6bc289679a...</a>And then used to identify Muslim drivers:<a href="http://www.reddit.com/r/dataisbeautiful/comments/2t201h/identifying_muslim_cabbies_from_trip_data_and/" rel="nofollow">http://www.reddit.com/r/dataisbeautiful/comments/2t201h/iden...</a>And then used to track celebrities:<a href="http://theiii.org/index.php/316/which-celebrity-is-taking-a-taxi-where/" rel="nofollow">http://theiii.org/index.php/316/which-celebrity-is-taking-a-...</a>

NeutronBoyabout 10 years ago

I can't recall the article, but there was a case where public data was de-anonymized based on DOB and zipcodes, and it was incredibly successful in a given state.

Ask HN: What are some clever ways data in public domain has been de-anonymized?

3 comments

Ask HN: What are some clever ways data in public domain has been de-anonymized?

3 comments