It looks like the NYCTaxi dataset is here:<p><a href="http://www.andresmh.com/nyctaxitrips/" rel="nofollow">http://www.andresmh.com/nyctaxitrips/</a><p>Some background on this data:<p><a href="http://chriswhong.com/data-visualization/taxitechblog1/" rel="nofollow">http://chriswhong.com/data-visualization/taxitechblog1/</a><p>And data for 2014 directly from the city:<p><a href="https://data.cityofnewyork.us/view/gn7m-em8n" rel="nofollow">https://data.cityofnewyork.us/view/gn7m-em8n</a>
How do databases like MySQL store data efficiently for querying? It seems like something like protobuf would do well here, though you'd need to generate code for each dataset.