TechEcho

5 comments

dansoover 9 years ago

Very smart inquiry...I had been somewhat of skeptic initially that the potential danger to privacy outweighed the value of making the data as transparent as possible, but that's just a guess out of my preconceived notions of taxi use, which were already inadequate as it still blows my mind how many taxi trips there are on an average day.I think an argument can still be made that even if the OP is right about the quantity that can be uniquely identified -- keeping the coordinate data still outweighs the real-life privacy risk, that is, the small number of people who want to hire a private investigator/specialist to analyze this data to catch a specific person would find it much faster to track the person the way that PI's normally do so. But the rebuttal can't simply be, "uniquely identifiable trips are probably so rare as to be inconsequential"

rahimnathwaniover 9 years ago

"it turns out that if you know the census tracts for pickups and drop offs, plus pickup times truncated to the nearest hour, then you can uniquely identify 40% of NYC taxi trips"Hmmm... but if you already have those pieces of information (start tract, end tract, start hour) what would you want to get from the data? How much someone paid? How much they tipped? Whether they paid with cash or card?Can anyone see an obvious nefarious use for this data?

评论 #10669225 未加载

dopameanover 9 years ago

I find this kind of analysis to be really awesome and I'd love to learn how to do even a more basic version of it. Does anyone have some resources they can point me to?I'm actually working on a small project that has a much, much smaller dataset than the NYC Taxi data but some similar attributes (geographic coordinates mainly). I'd love to produce something like this with what I find (assuming I can find anything interesting).

sreanover 9 years ago

I wonder how much one loses if the first and the last couple of miles are fuzzed over. Such data would still be quite useful.

dbpokornyover 9 years ago

> uniquely identified by birthday, gender, and ZIP codeThis is not correct; you need to say "full birthday" which includes the year, otherwise the statement is nonsense.

评论 #10667473 未加载

评论 #10667649 未加载

5 comments

dansoover 9 years ago

rahimnathwaniover 9 years ago

评论 #10669225 未加载

dopameanover 9 years ago

sreanover 9 years ago

I wonder how much one loses if the first and the last couple of miles are fuzzed over. Such data would still be quite useful.

dbpokornyover 9 years ago

> uniquely identified by birthday, gender, and ZIP codeThis is not correct; you need to say "full birthday" which includes the year, otherwise the statement is nonsense.

评论 #10667473 未加载

评论 #10667649 未加载

40% of NYC Taxi Trips Are Uniquely Identified by Census Tracts and Hour

5 comments

40% of NYC Taxi Trips Are Uniquely Identified by Census Tracts and Hour

5 comments