Kinda weird to see people going through this immense pain to get the whole import-pipeline and efficient search problem solved with OSM and Postgres + PostGIS when it's very much a "solved" problem.<p>Incremental updates and intelligent, flexible, efficient search are all immediately doable with existing open source software.<p>Why are people hellbent on using Postgres where it's suboptimal for a dataset this large that needs intelligent searching?<p>Seriously people: <a href="https://github.com/ncolomer/elasticsearch-osmosis-plugin" rel="nofollow">https://github.com/ncolomer/elasticsearch-osmosis-plugin</a><p>Learn the JSON document and query formats, and then proceed to jump with glee whenever you encounter a problem well served by a search engine instead of doing hack-neyed manual indices and full-text queries and poorly managed shards on Postgres.<p>Postgres is for important operational data. Excellent at that. Not so great for search, bulk, or static datasets.<p>ElasticSearch is so well-designed that I just expose a raw query interface to it to our frontend JS guy and he just builds his own queries from the user inputs.<p>ElasticSearch is probably like 20-30% of the technological secret sauce of my startup.
<i>Another problem I see is that we have a snapshot of data from friday. We cannot really link our data back to any of the original OSM data. So if we want to upgrade our dataset, we have to throw everything away that we have and start a new import.</i><p>This is the biggest hurdle to overcome, in my experience. A custom data format is typically essential (most location databases arrive as CSVs or XML, which are useless for real-time querying), but imports can take forever.<p>It's sometimes, counterintuitively, been more worthwhile to concentrate on the performance of importing than of querying; the out-of-the-box query performance you get with (no)SQL often isn't terrible, but your import script usually starts out pretty awful.
For parsing of US and UK addresses, you can look at the internal routines address identification and cleaning routines of Ziprip - <a href="http://zipripjs.com/" rel="nofollow">http://zipripjs.com/</a>
People that write their own geocoder from crappy sources are like people that write their own crypto libraries. It's a task best left up to experienced experts. There's a reason we pay for quality GIS software with good data or use services like Google, Bing, etc...