Oh boy, this gives me a chance to talk about one of the gems of astronomy software which deserves to be better known: HEALPixel tesselation!<p>HEALPixels stand for 'Hierarchical Equal-Area Iso-latitudinal Pixels'. It is a scheme that was developed to analyze signals that cover the entire sky, but with variable density.<p>Like HTM or Hilbert curves, this can be used to organize spatial data.<p>The tesselation looks kind of funny but has many good features - it doesn't have discontinuities at poles, and is always equal area. And with the "nested" healpixel formulation, pixels are identified by integers. Pixel IDs are hierarchical based on leading bits - so, for example, pixel 106 (=0110 1010) contains pixel 1709 (=0110 1010 1101). This lets you do some marvelous optimizations in queries if you structure your data appropriately. Nearest neighbor searches can be extremely quick if things are HEALPix-indexed - and so can radius searches, and arbitrary polygon searches.<p>HEALPixels are used today for more than just their original intent. LSST will use them for storing all-sky data and point source catalogs, for example.<p>More here:<p>- Original NASA/JPL site: <a href="https://healpix.jpl.nasa.gov/" rel="nofollow">https://healpix.jpl.nasa.gov/</a><p>- Popular Python implementation: <a href="https://healpy.readthedocs.io/en/latest/" rel="nofollow">https://healpy.readthedocs.io/en/latest/</a><p>- Good PDF primer: <a href="https://healpix.jpl.nasa.gov/pdf/intro.pdf" rel="nofollow">https://healpix.jpl.nasa.gov/pdf/intro.pdf</a><p>And an experimental database being built on healpix for extremely large data volumes (certainly many TB, maybe single-digit PB): <a href="https://github.com/astronomy-commons/hipscat">https://github.com/astronomy-commons/hipscat</a>
I experimented with geospatial Hilbert Curves as a Postgres extension [0] for PostGIS using the S2 [1] spherical geometry library. S2 uses a scale free cell coverage pattern that is numbered using six interlocking space filling Hilbert Curves [2]. This approach is similar to what is used in this article. S2 doesn't use hexagons but a different cell structure but the idea is generally the same.<p>By having both high level (cell) and low level (cell id) geometries it was a very powerful library which allowed projection from the hilbert space into a Postgres spatial index (spgist) including various trees, like noted in this article. It appears to be still quite active in development.<p>[0] <a href="https://github.com/michelp/pgs2">https://github.com/michelp/pgs2</a><p>[1] <a href="https://s2geometry.io/" rel="nofollow">https://s2geometry.io/</a><p>[2] <a href="https://s2geometry.io/devguide/s2cell_hierarchy" rel="nofollow">https://s2geometry.io/devguide/s2cell_hierarchy</a>
> a vehicle with a Hilbert Curve position of 0.34 is really close to one with 0.35 and really far from one with 0.89<p>But points with a large difference in their single curve coordinate can be either far apart or close together. E.g. on this 16 point Hilbert curve<p><pre><code> __. .__
__| |__
| __ |
|__| |__|
</code></pre>
the '.' marked points at 1/16th and 15/16th along the curve are adjacent.
The problem of finding the shortest path on a map (a la google maps etc) is often solved with approaches like "hub networks" and "contraction hierarchies" rather than simple spatial networks. Things like dijkstra's algorithm are only efficient if you're input is the raw graph and point and you only want one path once.<p>See just for a start: <a href="https://arxiv.org/abs/1504.05140" rel="nofollow">https://arxiv.org/abs/1504.05140</a>
Here is one of the recent advancements in performing similarity (including nearest neighbor) search on sparse high dimensional data using learned indices:
<a href="https://arxiv.org/abs/2204.10028" rel="nofollow">https://arxiv.org/abs/2204.10028</a><p>The paper basically deals with the most adverse case but the approach should work in lower dimensions and in data with a non-uniform density distribution as well.
I've recently implemented a map of the Internet using a similar technique: <a href="https://reversedns.space/" rel="nofollow">https://reversedns.space/</a><p>In ClickHouse, we have almost every mentioned technique: H3 and S2, geohashes, and indexing by space-filling curves.
The purpose of using space-filling curves is to convert coordinates into 1 dimensional data, thus they can be indexed by ordinary database index, no matter distrusted or not. The problem is it's much slower than a "real" spatial index like R-tree, and the supported queries are rather limited.<p>On the other hand, spatial data can be easily partitioned, rendering the aforementioned method less useful. Also, I've never seen anyone needed more than a single instance of postgres to index all the spatial data, unless you are dealing with spatial temporal data, which can grow infinitely large. Then again, spatial temporal data can be easily partitioned.
It would be interesting to know what other solutions exist in this space and why they weren’t chosen.<p>I know of at least one, GeoMesa, which seems like it could at least provide the building blocks to achieve what they are trying to do.
Also see "Evenly distributing points on a sphere" <a href="https://news.ycombinator.com/item?id=17842057">https://news.ycombinator.com/item?id=17842057</a>
Makes me want to play more hex.<p><a href="https://www.drking.org.uk/hexagons/hex/templates.html" rel="nofollow">https://www.drking.org.uk/hexagons/hex/templates.html</a>
I knew about hexagons but not about R-trees and Hilbert curves, and then combining the two. Even though that was an inbound content article, it <i>was</i> at least informative.