TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Reverse geocoding is hard

275 点作者 pavel_lishin17 天前

27 条评论

Dachande66317 天前
Fun fact that was dredged up because the author mentions Australia: GPS points change. Their example coordinates give 6 decimal places, accurate to about 10-15cm. Australia a few years back shifted all locations 1.8m because of continental drift they’re moving north at ~7cm/year). So even storing coordinates as a source of truth can be hazardous. We had to move several thousand points for a client when this happened.
评论 #43813596 未加载
评论 #43812959 未加载
评论 #43812940 未加载
评论 #43813133 未加载
评论 #43813299 未加载
评论 #43815106 未加载
评论 #43813443 未加载
评论 #43816320 未加载
评论 #43817132 未加载
评论 #43822405 未加载
评论 #43818149 未加载
andrew_eu17 天前
I have a memorable reverse geocoding story.<p>I was working with a team that was wrapping up a period of many different projects (including a reverse geocoding service) and adopting one major system to design and maintain. The handover was set to be after the new year holidays and the receiving teams had their own exciting rewrites planned. I was on call the last week of the year and got an alert that sales were halted in Taiwan due to some country code issue and our system seemed at fault. The customer facing application used an address to determine all sorts of personalization stuff: what products they&#x27;re shown, regulatory links, etc. Our system was essentially a wrapper around Google Maps&#x27; reverse geocoding API, building in some business logic on top of the results.<p>That morning, at 3am, the API stopped serving the country code for queries of Kinmen County. It would keep the rest of the address the same, but just omit the country code, totally botching assumptions downstream. Google Maps seemingly realized all of a sudden what strait the island was in, and silently removed what some people dispute.<p>Everyone else on the team was on holiday and I couldn&#x27;t feasibly get a review for any major mitigations (e.g. switching to OSM or some other provider). So I drew a simple polygon around the island, wrote a small function to check if the given coordinates were in the polygon, and shipped the hotfix. Happily, the whole reverse geocoding system was scrapped with a replacement by February.
评论 #43815095 未加载
评论 #43817051 未加载
jandrewrogers17 天前
Most people don’t have an intuitive sense of just how technically difficult mapping from real geospatial coordinates to feature spaces is. This is a great example of a relatively simple case. You are essentially doing inference on a sparse data model with complex local non-linearities throughout. If you add in dynamic relationships, like things that move in space, it becomes another order of magnitude worse. We frequently don’t have enough data to make a reliable inference even in theory and you need a way of reliably determining that.<p>This problem has been the subject of intense interest by the defense research community for decades. It has been conjectured to be an AI-complete type problem for at least ten years, i.e. solving it is equivalent to solving AGI. The current crop of LLM type AI persistently fails at this class of problems, which is one of the arguments for why LLM tech can’t lead to true AGI.
评论 #43813352 未加载
评论 #43815843 未加载
sinuhe6917 天前
Not my area of expertise, but is this not a form of perfectionist problem? I mean, most places have a clear and simple address. For the rest, either a human can solve it, or we can make a few examples and let an AI do the work. We can go back to them later and revise them if we need to. Addresses don&#x27;t change often, so I think things can stay the same for a long time.<p>Except for emergency dispatch and a few high-profile use cases, you can have a good enough address to let the user find its neighbourhood. But they still have the GPS or other form of address coding, so they can find the exact location easily. I&#x27;d say 99.9% of the cases are like that. The rest can be solved quickly by looking at the map!
评论 #43813746 未加载
评论 #43815165 未加载
评论 #43813806 未加载
评论 #43819305 未加载
评论 #43813137 未加载
评论 #43813482 未加载
vintermann17 天前
Genealogy applications run into this a lot. The person of interest lived at Engeset. FamilySearch has geocoded a place called &quot;Engeset, Møre og Romsdal, Norway&quot;. So that&#x27;s it, right? Not so fast, [there are at least 3 Engesets in Møre og Romsdal](<a href="https:&#x2F;&#x2F;www.google.com&#x2F;maps&#x2F;search&#x2F;Engeset&#x2F;@62.3358577,6.2251463,132438m&#x2F;data=!3m2!1e3!4b1?entry=ttu&amp;g_ep=EgoyMDI1MDQyMy4wIKXMDSoASAFQAw%3D%3D" rel="nofollow">https:&#x2F;&#x2F;www.google.com&#x2F;maps&#x2F;search&#x2F;Engeset&#x2F;@62.3358577,6.225...</a>).<p>But that&#x27;s at least better than when it&#x27;s some local place name which it&#x27;s never heard of, and thinks sounds most similar to a place in Afghanistan (this happens all the time).<p>And to add to it, there are administrative regions, and ecclesiastical regions. Do you put them in the parish, or in the municipality? The birth in the parish and the baptism in the municipality, maybe? How about the burial then...
评论 #43813102 未加载
indeed3016 天前
Ten years ago, I worked for a company that had billions of sensor readings from mobile phones. The idea was to use crowdsourced data to create truly detailed, real-world coverage maps, and then sell that data to marketing and network operations teams at telcos.<p>We used reverse geocoding extensively — but never down to street addresses, always to a higher level. We wanted to split measurements by country, region, city — any geographic unit. When you deal with country borders, you get a lot of weird measurements as phones roam onto foreign networks. We weren’t interested in reporting on the experience of users roaming while abroad, so we needed shapefiles good enough to filter all that out and to partition the rest of the data cleanly.<p>We built a 30-machine Spark cluster on AWS back when Spark was still super early — around v0.7, definitely before 1.0. At the time, you pretty much had to use Scala with Spark if you cared about performance. Most of the workload was point-in-polygon tests. Before that, we were using a brutally hacky pipeline involving PostGIS, EMR, and Pig, and it was hell.<p>It was incredibly fun, but looking back now, I can see so clearly all the mistakes I made.
评论 #43823206 未加载
punnerud17 天前
I created this to solve my own need for reverse geocoding: <a href="https:&#x2F;&#x2F;github.com&#x2F;punnerud&#x2F;rgcosm">https:&#x2F;&#x2F;github.com&#x2F;punnerud&#x2F;rgcosm</a> (Saving me thousands of $ compared to Google API)<p>Uses OpenStreetmap file, Python and SQLite3.<p>First it finds all addresses using +&#x2F;- like a square from lat&#x2F;lon, then calculate distance based on the smaller list (Pythagoras), and pick the closest. It expands until a set maximum if no address is found in the first search.
评论 #43812986 未加载
andrewaylett17 天前
It&#x27;s a lot more expensive, but measuring navigation distance rather than straight line distance would avoid the &quot;river&quot; issue. Although depending on the routing engine and dataset it might well introduce more issues where points can be really close on foot but the only known route is a driving route.
评论 #43812857 未加载
AlotOfReading17 天前
I haven&#x27;t found a better way do this than the Google maps solution [0]:<p>You write a query of all the different kinds of addresses you&#x27;d like to display. The query result is a list of valid candidate addresses for the point matching at least one format that you can rank based on whatever criteria you like.<p>[0] <a href="https:&#x2F;&#x2F;developers.google.com&#x2F;maps&#x2F;documentation&#x2F;geocoding&#x2F;requests-reverse-geocoding" rel="nofollow">https:&#x2F;&#x2F;developers.google.com&#x2F;maps&#x2F;documentation&#x2F;geocoding&#x2F;r...</a>
评论 #43814296 未加载
rovr13817 天前
Have you looked at the geonames database?, <a href="https:&#x2F;&#x2F;www.geonames.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.geonames.org&#x2F;</a><p>Info and schema is here, <a href="https:&#x2F;&#x2F;download.geonames.org&#x2F;export&#x2F;dump&#x2F;readme.txt" rel="nofollow">https:&#x2F;&#x2F;download.geonames.org&#x2F;export&#x2F;dump&#x2F;readme.txt</a><p>Could be a good source. Not sure how good it is worldwide, but the countries I’ve used it for, it’s been useful and pretty good.<p>Try the search too, <a href="https:&#x2F;&#x2F;www.geonames.org&#x2F;search.html?q=R%C3%ADo+grande&amp;country=" rel="nofollow">https:&#x2F;&#x2F;www.geonames.org&#x2F;search.html?q=R%C3%ADo+grande&amp;count...</a><p>Not just roads, but there’s rivers, and other things too
评论 #43813643 未加载
评论 #43812913 未加载
johnlk17 天前
It’s almost more of a UX challenge than anything. The feedback widget idea at the end could offer a crowd sourced solution the same way Twitch solved translation via crowdsourcing.
nedt16 天前
Is it even worth it? What most of the users will be doing is enter the address in their map or route app of their choice to see on a map the directions. Especially for something that&#x27;s inside a park an address is really not that useful, but also outside most people don&#x27;t know where some specific house number is.<p>Also having zones might not be super useful. Like when I&#x27;m in a city next to the border of a district I wouldn&#x27;t want to only search in my current district. Something on the other side of my current district might be much harder to reach than something in the neighboring district.<p>Giving a place nearby, like a landmark, can aid in finding interesting places, but in the end a simple radius search, or route distance search or even something next to a path, might be much more useful. Which is more or less what is being done when you visualize points on a map.<p>Staying closer to coordinates also gets rid of localization issues. And that&#x27;s not just different languages and scripts but also how addresses are used worldwide. There are some important cultural differences.
byoung215 天前
I recently had to do a lot of work mapping locations inside Disneyland. Once you are inside the park, street addresses aren&#x27;t useful. I used geoJSON objects to describe the geometry of the resort, the park, then each land, then each ride, restaurant, store, etc, then the elements of each of these, so you have increasingly smaller geometries. Then I used geospatial queries to determine if a point is inside, outside, or nearby a known geometry. So you can say, for example that a certain churro cart is (inside) Disneyland Resort [resort] &gt; (inside) Disney California Adventure [park] &gt; (inside) Buena Vista Street [land], (nearby) Grizzly River Run.<p>Another challenge is that these shapes change over time. Rides, lands, etc constantly change due to construction, but queues dynamically change size and shape during the day based on crowd size (cast members put extra ropes out to control long lines, and remove them to allow for parades and extra walkways)
nerdralph17 天前
Part of the problem is the different ways addresses are expressed throughout the world. I was born and grew up in Canada, and was confused when I started dealing with companies in China. Instead of street addresses, many are given by province, city, district, sub-district, and a building number.<p>Another problem is choosing which authority for the &quot;correct&quot; address. I&#x27;ve seen many cases where the official postal address city&#x2F;town name is different than the 911 database. For example Canada Post will say some street addresses are in Dartmouth, while the official civic address is really Cole Harbour. <a href="https:&#x2F;&#x2F;www.canadapost-postescanada.ca&#x2F;ac&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.canadapost-postescanada.ca&#x2F;ac&#x2F;</a> <a href="https:&#x2F;&#x2F;nsgi.novascotia.ca&#x2F;civic-address-finder&#x2F;" rel="nofollow">https:&#x2F;&#x2F;nsgi.novascotia.ca&#x2F;civic-address-finder&#x2F;</a><p>Even streets can have multiple official names&#x2F;aliases. People who live on &quot;East Bay Hwy&quot;, also live on &quot;Highway 4&quot;, which is an alias.
mvdtnz17 天前
I dealt with this exact issue and went with that exact solution in my browser based geography game[0].<p>What the author is looking for is administrative divisions and boundaries[1], in particular probably down to level 3 which is the depth my game goes to. These differ in size greatly by country. With admin boundaries you need to accept there is no one-size-fits-all solution and embrace the quirks of the different countries.<p>For my game I downloaded a complete database of global admin boundaries[2] and imported them into PostgreSQL for lightning fast querying using PostGIS.<p>[0] <a href="https:&#x2F;&#x2F;guesshole.com" rel="nofollow">https:&#x2F;&#x2F;guesshole.com</a><p>[1] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;List_of_administrative_divisions_by_country" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;List_of_administrative_divisio...</a><p>[2] <a href="https:&#x2F;&#x2F;gadm.org&#x2F;data.html" rel="nofollow">https:&#x2F;&#x2F;gadm.org&#x2F;data.html</a>
jillesvangurp16 天前
This is a while ago but about 12 years ago I experimented with putting the whole of openstreetmap into Elasticsearch.<p>Reverse geocoding then becomes a problem of figuring out which polygons contain the point with a simple query and which POIs&#x2F;streets&#x2F;etc. are closest based on perpendicular distance. For that, I simply did a radius search and some post processing on any street segments. Probably not perfect for everything. But it worked well enough. My goal was actually being able to group things by neighborhood and microneighborhoods (e.g. squares, nightlife areas, etc.).<p>This should work well enough with anything that allows for geospatial queries. In a pinch you can use geohashes (I actually did this because geospatial search was still a bit experimental in ES).
评论 #43820972 未加载
评论 #43816302 未加载
morkalork17 天前
If I were giving directions to another human and not using house addresses I&#x27;d say something like &quot;Queen street about half way down the block between Crawford and Shaw&quot;
评论 #43812873 未加载
评论 #43812955 未加载
kylecazar16 天前
Good article. FWIW, some major cities offer seating data. New York, for example, returns bench locations as a Point (coordinates). They even have a column in the data for the nearest address of the &quot;seating feature&quot;.<p><a href="https:&#x2F;&#x2F;data.cityofnewyork.us&#x2F;Transportation&#x2F;Seating-Locations&#x2F;esmy-s8q5" rel="nofollow">https:&#x2F;&#x2F;data.cityofnewyork.us&#x2F;Transportation&#x2F;Seating-Locatio...</a>
the_arun17 天前
Nicely written article. So simple yet interesting. I wish more people made projects like these.
评论 #43812441 未加载
glitchc16 天前
What 3 words (<a href="https:&#x2F;&#x2F;what3words.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;what3words.com&#x2F;</a>) solves this problem, but it doesn&#x27;t seem to be popular.<p>If anyone has experience, I would be curious to know why.
评论 #43824750 未加载
amelius17 天前
Why not take the openstreetmaps address (which is long), chop it into a list of short combinations, then do a lookup for each combination, and see which short address gives you the best (geographically closest) match?
dadadad10017 天前
This problem seems to also exist for services like uber. Their solution seems easier, drop a pin on a map. Perhaps working so hard to find a textual description is missing the simpler solution.
blacklight17 天前
As the developer of a GPS tracking app that relies a lot on OpenStreetMap, I&#x27;ve faced many of these problems myself. A couple of learned lessons&#x2F;insights:<p>- I avoid relying on any generic location name&#x2F;description provided by these APIs. Always prefer structured data whenever possible, and build the locality name from those components (bonus points if you let the user specify a custom format).<p>- Identifying those components itself is tricky. As the author mentioned, there are countries that have states, others that have regions, other that have counties, or districts, or any combination of those. And there are cities that have suburbs, neighbourhoods, municipalities, or any combination. Oh, and let&#x27;s not even get started with address names - house numbers? extensions? localization variants - e.g. even the same API may sometimes return &quot;Marrakesh&quot; and sometimes &quot;Marrakech&quot;? and how about places like India where nearby amenities are commonly used instead of house numbers? I&#x27;m not aware of any public APIs out there that provide these &quot;expected&quot; taxonomies, preferably from lat&#x2F;long input, but I&#x27;d love to be proven wrong. In the absence of that, I would suggest that is better to avoid double-guessing - unless your software is only intended to run in a specific country, or in a limited number of countries and you can afford to hardcode those rules. It&#x27;s probably a good option to provide a sensible default, and then let the user override it. Oh, and good catch about abbreviations - I&#x27;d say to avoid them unless the user explicitly enables them, if you want to avoid the &quot;does everybody know that IL is Illinois?&quot; problem. Just use &quot;Illinois&quot; instead, at least by default.<p>- Localization of addresses is a tricky problem only on the surface. My proposed approach is that, again, the user is king. Provide English by default (unless you want to launch your software in a specific country), and let the user override the localization. I feel like the Nominatim&#x27;s API approach is probably the cleanest: honor the `Accept-Language` HTTP header if available, and if not available, fallback to English. And then just expose that a setting to the user.<p>- Bounding boxes&#x2F;polygons can help a lot with solving the proximity&#x2F;perimeter issue. But they aren&#x27;t always present&#x2F;sufficiently accurate in OSM data. And their proper usage usually requires the client&#x27;s code to run some non-trivial lat&#x2F;long geometry processing code, even to answer trivial questions such as &quot;is this point inside of this enclosed amenity?&quot; Oh, and let&#x27;s not even get started with the &quot;what&#x27;s the exact lat&#x2F;long of this address?&quot; problem. Is it the entrance of the park? The middle of it? I remember that when I worked with the Bing in the API in the past they provided more granular information at the level of rooftop location, entrance location etc.<p>- Providing localization information for public benches isn&#x27;t what I&#x27;d call an orthodox use-case for geo software, so I&#x27;m not entirely sure of how to solve the &quot;why doesn&#x27;t everything have an address?&quot; problem :)
osmanscam16 天前
you can use <a href="https:&#x2F;&#x2F;map.name" rel="nofollow">https:&#x2F;&#x2F;map.name</a> for reverse geocoding
dpmdpm17 天前
I read this as Reverse Genociding Is Hard, thought I was on a Nethack forum, and thought, No, it&#x27;s pretty easy with a cursed scroll.
评论 #43814561 未加载
gmoore17 天前
maybe the &#x27;three words&#x27; model? Seems like it would be specific enough to locate a bench
评论 #43814062 未加载
评论 #43813072 未加载
评论 #43814911 未加载
1970-01-0116 天前
&gt;But how do you go from a string of digits to something human readable?<p>Hasn&#x27;t What3Words already solved this?
评论 #43814899 未加载