TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How to Start Your Own GeoCoder in Less Than 48 Hours

34 pointsby bitboxerover 12 years ago

4 comments

codewrightover 12 years ago
Kinda weird to see people going through this immense pain to get the whole import-pipeline and efficient search problem solved with OSM and Postgres + PostGIS when it's very much a "solved" problem.<p>Incremental updates and intelligent, flexible, efficient search are all immediately doable with existing open source software.<p>Why are people hellbent on using Postgres where it's suboptimal for a dataset this large that needs intelligent searching?<p>Seriously people: <a href="https://github.com/ncolomer/elasticsearch-osmosis-plugin" rel="nofollow">https://github.com/ncolomer/elasticsearch-osmosis-plugin</a><p>Learn the JSON document and query formats, and then proceed to jump with glee whenever you encounter a problem well served by a search engine instead of doing hack-neyed manual indices and full-text queries and poorly managed shards on Postgres.<p>Postgres is for important operational data. Excellent at that. Not so great for search, bulk, or static datasets.<p>ElasticSearch is so well-designed that I just expose a raw query interface to it to our frontend JS guy and he just builds his own queries from the user inputs.<p>ElasticSearch is probably like 20-30% of the technological secret sauce of my startup.
评论 #4661943 未加载
评论 #4659199 未加载
robmilover 12 years ago
<i>Another problem I see is that we have a snapshot of data from friday. We cannot really link our data back to any of the original OSM data. So if we want to upgrade our dataset, we have to throw everything away that we have and start a new import.</i><p>This is the biggest hurdle to overcome, in my experience. A custom data format is typically essential (most location databases arrive as CSVs or XML, which are useless for real-time querying), but imports can take forever.<p>It's sometimes, counterintuitively, been more worthwhile to concentrate on the performance of importing than of querying; the out-of-the-box query performance you get with (no)SQL often isn't terrible, but your import script usually starts out pretty awful.
评论 #4659761 未加载
评论 #4658854 未加载
peteretepover 12 years ago
For parsing of US and UK addresses, you can look at the internal routines address identification and cleaning routines of Ziprip - <a href="http://zipripjs.com/" rel="nofollow">http://zipripjs.com/</a>
xradionutover 12 years ago
People that write their own geocoder from crappy sources are like people that write their own crypto libraries. It's a task best left up to experienced experts. There's a reason we pay for quality GIS software with good data or use services like Google, Bing, etc...
评论 #4659662 未加载
评论 #4659190 未加载