I have built one of these map rendering systems on EC2. We decided to go with a 100% on-the-fly + two caching layers (one HTTP on top and one application-specific in between [TileCache]). There were dozens of layers (now there are thousands) so pre-rendering all of them was not feasible. Since I left that gig, the current team has added some processes to "warm" the caches before new data goes live. It just takes a few minutes though, nowhere near 2% of the tilespace.<p>From what I remember, the biggest performance trick was figuring out how to properly pack worldwide street data (geometry, categorization, and label) + indexes into RAM on a single machine without using a custom file format. It involved stripping out every little unnecessary byte and sorting all of the streets geographically to improve access locality. I believe this got down to ~15GB of shapefiles.
This comment will probably get buried and no-one cares, but...<p>OSM, although a great force for good in the mapping world, is not in its current state a viable competitor to Google Maps and other paid datasets.<p>You need to start driving around, using LIDAR, and create a massive parcel / address database to really do it right.<p>You need photos, from the street and from the air. For OSM to really fly we need to see open source hardware combining photography, GPS, LIDAR, that folks can strap to their cars or fly from RC planes or balloons or kites.<p>Geocoding needs to actually work, worldwide. That's incredibly hard. Its so much more than the visual maps.<p>Just pointing this all out, since everyone seems to gloss over the fact that Google Maps has a massive monopoly in this area right now.
Interesting stuff. This is essentially a coding/compression problem. My advisor at UCLA helped pioneer the computer vision analoge for this. His work tackled encoding textures (or areas of little information) with markov random fields and areas of high information (edges, features) with graphs.<p><a href="http://www.stat.ucla.edu/~sczhu/publication.html" rel="nofollow">http://www.stat.ucla.edu/~sczhu/publication.html</a>
Not sure if I understand why real-time tile rendering on servers doesn't work.<p>Googly clearly does not pre-render tiles and it looks like it works fine for them. Request is made, data collected, relevant tiles rendered, returned to client-side. Yes, I know, Google has $billions in computing resources, but does it really take that much server-power to render tiles? (even for 1,000s of requests/second?)<p>Is it a matter of data transfer or processing capacity? A screen sized 2048x1536 would need to load 48 tiles at a time. Google's tiles for a city avg about 14KB/tile, so 672KB. 5,000 of these a second is 3.4GB. (I'm a front-end guy so this is a little out of my league.)
Hmm, I'm curious about how much this is overstating the effectiveness of the optimizations in order to teach about them. With this approach, (it seems like) you would still have to render the highest zoom level first, which already takes 3/4 the render time anyway. There are lots of other optimizations you can do (and they probably are doing) there, but they aren't related to the tree/reference-based ones mentioned here.<p>The presentation also seems to overstate the redundancy found in the land tiles. You would get savings from water tiles at all zoom levels, which would be enormous, but (looking at <a href="http://mapbox.com/maps" rel="nofollow">http://mapbox.com/maps</a>) even if humans only cover 1% of the land, our infrastructure is well enough distributed that it and inland water and other details they've included would preclude redundancy at all but the highest zoom levels (although, in this case, the highest zoom level taking up 3/4 of the tiles <i>saves</i> the most).<p>With that in mind, I'm wondering about the claimed rendering time of 4 days. That fits nicely with the story told, but with the 32 render servers mentioned at the end, that would seem to be 128 CPU days (though I'm not sure about the previous infrastructure they were comparing it to), which is actually close to the count mentioned early on with a super-optimized query and render process. This is all just supposition, so I don't want to sound too sure of myself, but the storage savings seems to be the big win here (60% from water + redundancy at highest zoom levels), while I would guess that you would save considerably less in processing (15% from water + minor redundancy on land (absent other optimizations e.g. run-length-based ones)).
I find myself in awe of complex technical information distilled into such a beautiful, simple style such as evident in this presentation.<p>That right brain / left lane combo strikes me as just about the fundamental quality to look for in a startup founder or team.<p>Nice work.
The author writes, "I found myself wishing Word had a simple, built-in button for '"cut it out and never again do that thing you just did'". Then I look up at the Clippy image and it has a check box with "Don't show me this tip again". Hmmmm.
That makes Satellite View in Google Maps look like a much more amazing feat.<p>Isn't that algorithm going to erase any islands small enough to not show up in zoom level 5?