TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

40% of NYC Taxi Trips Are Uniquely Identified by Census Tracts and Hour

13 点作者 lil_tee超过 9 年前

5 条评论

danso超过 9 年前
Very smart inquiry...I had been somewhat of skeptic initially that the potential danger to privacy outweighed the value of making the data as transparent as possible, but that&#x27;s just a guess out of my preconceived notions of taxi use, which were already inadequate as it still blows my mind how many taxi trips there are on an average day.<p>I think an argument can still be made that even if the OP is right about the quantity that can be uniquely identified -- keeping the coordinate data still outweighs the real-life privacy risk, that is, the small number of people who want to hire a private investigator&#x2F;specialist to analyze this data to catch a specific person would find it much faster to track the person the way that PI&#x27;s normally do so. But the rebuttal can&#x27;t simply be, &quot;uniquely identifiable trips are probably so rare as to be inconsequential&quot;
rahimnathwani超过 9 年前
&quot;it turns out that if you know the census tracts for pickups and drop offs, plus pickup times truncated to the nearest hour, then you can uniquely identify 40% of NYC taxi trips&quot;<p>Hmmm... but if you already have those pieces of information (start tract, end tract, start hour) what would you want to get from the data? How much someone paid? How much they tipped? Whether they paid with cash or card?<p>Can anyone see an obvious nefarious use for this data?
评论 #10669225 未加载
dopamean超过 9 年前
I find this kind of analysis to be really awesome and I&#x27;d love to learn how to do even a more basic version of it. Does anyone have some resources they can point me to?<p>I&#x27;m actually working on a small project that has a much, much smaller dataset than the NYC Taxi data but some similar attributes (geographic coordinates mainly). I&#x27;d love to produce something like this with what I find (assuming I can find anything interesting).
srean超过 9 年前
I wonder how much one loses if the first and the last couple of miles are fuzzed over. Such data would still be quite useful.
dbpokorny超过 9 年前
&gt; uniquely identified by birthday, gender, and ZIP code<p>This is not correct; you need to say &quot;full birthday&quot; which includes the year, otherwise the statement is nonsense.
评论 #10667473 未加载
评论 #10667649 未加载