TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Analyzing 1.1B NYC Taxi and Uber Trips

140 点作者 lil_tee超过 9 年前

9 条评论

minimaxir超过 9 年前
Yesterday, I made a post about how to reconstruct the NYC map visualization using the 1.1B Taxi Data using ggplot2: <a href="http:&#x2F;&#x2F;minimaxir.com&#x2F;2015&#x2F;11&#x2F;nyc-ggplot2-howto&#x2F;" rel="nofollow">http:&#x2F;&#x2F;minimaxir.com&#x2F;2015&#x2F;11&#x2F;nyc-ggplot2-howto&#x2F;</a><p>Looking at the code for the visualization, the author did an independently similar approach (with the same tools), and one that turned out slightly different, which is what makes things interesting.<p>It&#x27;s worth nothing that back in August, only the 2014 and 2015 datasets were released by the NYC TLC. I&#x27;m not entirely sure why they decided to release 2009-2012 now.<p>If you&#x27;re looking to just playing with the data, I recommend using the BigQuery approach as noted in my article, since downloading and processing ~300GB might take awhile. However, the shape file approach used in the original article the next logical step after that, and one that is put to <i>very</i> good use in the article.
评论 #10583959 未加载
评论 #10582806 未加载
apaprocki超过 9 年前
Can spikes be seen in the late-night data when new businesses open around the pickup address? In the Williamsburg section, the reason why the N 11th St block area is bright red in the observation is mostly due to the opening of a hotel&#x2F;restaurant and two very popular electronic music clubs. Prior to those 3 businesses opening I can&#x27;t think of any reason why anyone would be in that 1 block area late at night. Is there anything city agencies could do with this feedback loop of data after businesses open to assess their impact on an area? Liquor licenses? MTA?
评论 #10584333 未加载
评论 #10583452 未加载
superuser2超过 9 年前
I&#x27;m happy to see that Uber is not releasing dropoff data. It&#x27;s not terribly difficult to de-anonymize someone in a dump like this.
kctess5超过 9 年前
This is some seriously scary data. It would not be hard to de-anonymize this data if you know a person&#x27;s address, and I&#x27;m sure there&#x27;s all sort of nefarious activity that famous people&#x2F;politicians&#x2F;other people might not want to be public knowledge.<p>On that note, I think that &quot;with a Vengeance&quot; is a bit disingenuous, considering that that this data could be used to personally attack people (but wasn&#x27;t)... That&#x27;s almost certainly not a bad thing, though.
评论 #10583606 未加载
samstave超过 9 年前
With 19,000,000 rides in a 6-month period for uber, with an average assumed ride cost of ~$7 that would mean $133,000,000 in revenues, if Uber takes ~40% - that would be 53,000,000 or nearly 10,000,000 per month that Uber made in that 2009 period alone.<p>Wow.
评论 #10583174 未加载
评论 #10583557 未加载
评论 #10583833 未加载
ghaff超过 9 年前
A lot of interesting data here. Just eyeballing it, it appears as if Uber (plus green taxis) have significantly grown the number of taxi-like rides in Brooklyn and Queens relative to yellow taxis alone. However, in addition to Uber and green taxis being a small part of the Manhattan mix, it looks as if the number of rides they take may have come largely at the expense of yellow taxis.
评论 #10583575 未加载
zobzu超过 9 年前
The data is interesting, but the git repo is actually nice as well. easy to read through and replicate for other kind of stats. thanks!
thro1237超过 9 年前
Was the 300GB of data processed on MacBook Air? How as the response time for queries?
评论 #10583817 未加载
yunti超过 9 年前
That is phenomenal analysis, well done!