> today, spatial data users have a problem if they need to scale above about a million features<p>I found this claim to be surprising. That is a small dataset even for cartography, never mind geospatial analysis.<p>My heuristic advice for years has been that the comfortable limits of open source geospatial analysis are about a billion features if you know what you are doing and use the tools well. This is still a pretty small dataset for geospatial analysis but it is large enough that it covers a lot of use cases. The storage format is mostly at the margin, scalability is much more about optimal scheduling and query selectivity.<p>Scaling past this point puts you into the realm of exotics with custom I/O and execution schedulers (required) and much less obvious ways of organizing the data that nonetheless scale qualitatively better because they have better selectivity with a smaller memory footprint.