TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: What are some clever ways data in public domain has been de-anonymized?

3 pointsby danielhughesabout 10 years ago

3 comments

johnloeberabout 10 years ago
Semantic analysis of the Federalist Papers[0] comes to mind. It originally was not known which individual author wrote which paper, but stylometric analysis (i.e. word-counting and matching word frequency distributions of the unlabelled papers against those of labelled papers (in which the author was known)) made it reasonably straight-forward to identify the original authors.<p>[0] A set of historical papers of great political importance. <a href="http://en.wikipedia.org/wiki/The_Federalist_Papers" rel="nofollow">http:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;The_Federalist_Papers</a>
alex_sfabout 10 years ago
Taxi data from NYC was deanonymized:<p><a href="https://medium.com/@vijayp/of-taxis-and-rainbows-f6bc289679a1" rel="nofollow">https:&#x2F;&#x2F;medium.com&#x2F;@vijayp&#x2F;of-taxis-and-rainbows-f6bc289679a...</a><p>And then used to identify Muslim drivers:<p><a href="http://www.reddit.com/r/dataisbeautiful/comments/2t201h/identifying_muslim_cabbies_from_trip_data_and/" rel="nofollow">http:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;dataisbeautiful&#x2F;comments&#x2F;2t201h&#x2F;iden...</a><p>And then used to track celebrities:<p><a href="http://theiii.org/index.php/316/which-celebrity-is-taking-a-taxi-where/" rel="nofollow">http:&#x2F;&#x2F;theiii.org&#x2F;index.php&#x2F;316&#x2F;which-celebrity-is-taking-a-...</a>
NeutronBoyabout 10 years ago
I can&#x27;t recall the article, but there was a case where public data was de-anonymized based on DOB and zipcodes, and it was incredibly successful in a given state.