TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: What are some clever ways data in public domain has been de-anonymized?

3 点作者 danielhughes大约 10 年前

3 条评论

johnloeber大约 10 年前
Semantic analysis of the Federalist Papers[0] comes to mind. It originally was not known which individual author wrote which paper, but stylometric analysis (i.e. word-counting and matching word frequency distributions of the unlabelled papers against those of labelled papers (in which the author was known)) made it reasonably straight-forward to identify the original authors.<p>[0] A set of historical papers of great political importance. <a href="http://en.wikipedia.org/wiki/The_Federalist_Papers" rel="nofollow">http:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;The_Federalist_Papers</a>
alex_sf大约 10 年前
Taxi data from NYC was deanonymized:<p><a href="https://medium.com/@vijayp/of-taxis-and-rainbows-f6bc289679a1" rel="nofollow">https:&#x2F;&#x2F;medium.com&#x2F;@vijayp&#x2F;of-taxis-and-rainbows-f6bc289679a...</a><p>And then used to identify Muslim drivers:<p><a href="http://www.reddit.com/r/dataisbeautiful/comments/2t201h/identifying_muslim_cabbies_from_trip_data_and/" rel="nofollow">http:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;dataisbeautiful&#x2F;comments&#x2F;2t201h&#x2F;iden...</a><p>And then used to track celebrities:<p><a href="http://theiii.org/index.php/316/which-celebrity-is-taking-a-taxi-where/" rel="nofollow">http:&#x2F;&#x2F;theiii.org&#x2F;index.php&#x2F;316&#x2F;which-celebrity-is-taking-a-...</a>
NeutronBoy大约 10 年前
I can&#x27;t recall the article, but there was a case where public data was de-anonymized based on DOB and zipcodes, and it was incredibly successful in a given state.