TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: I built an interactive map and search engine for US Census data

3 pointsby NameError10 months ago
The core idea here is to use semantic search &amp; LLMs to make it easier to search the tens of thousands of different demographic indicators available from the US Census API. I&#x27;m definitely not the first to try something like this, but I think this solution has some nice properties that I haven&#x27;t seen in similar tools:<p>- Barring serious bugs, BlockAtlas won&#x27;t &quot;lie&quot; to users. It may fail to find something relevant, or misunderstand a query, but the results (map title &amp; data) will faithfully reflect the underlying Census estimates<p>- BlockAtlas covers a much wider set of Census data than other tools I&#x27;ve seen. Almost every &quot;Detailed Table&quot; from the American Community Survey is available, across the entire range of release years (2005-2022). There are ~29,000 demographic indicators in the search index as it stands, plus some combinations of indicators (e.g. &quot;X and above&quot;) for popular tables<p>Similar LLM+Census things I&#x27;ve seen have used an approach akin to &quot;replicate some data into my DB, have LLM generate SQL over it&quot;, which makes it hard to avoid issues with both of these points. I&#x27;ve taken a bit of a different approach - creating a search index over metadata, i.e. searching for API parameters and pulling the data itself directly from the Census. That way, the LLM is limited to &quot;selecting between known-valid options&quot;, rather than generating a SQL query and displaying the results under a potentially-misleading name.<p>This is the second iteration of Blockatlas - the first was a ChatGPT plugin. The LLM would query my API for candidate variables, and generate a link to my site with the variables to display and the map title as query parameters. This made for a cool demo but ultimately was very hard to trust - the LLM could select a map title which was not at all reflected by the variables in question, or could combine variables in a nonsensical way, so it failed to solve the &quot;don&#x27;t lie to users&quot; problem. The plugin (&quot;GPT&quot; now) is still available, but the standalone search engine is my effort to remedy those issues.<p>The tech stack: The frontend uses React for the search form and Leafet map. API is written in Typescript and hosted on Cloudflare Workers. The search indexes are in a Postgres DB using pgvector + OpenAI embeddings as well as pg&#x27;s built-in full-text-search feature, and the OpenAI API is used for query-parsing and result reranking&#x2F;selection as well (gpt-3.5-turbo).<p>I think there&#x27;s a ton of room for improvement here, but wanted to gauge public interest a bit before putting more time into this (I have a newborn and a full time job, so it&#x27;s been hard to carve out time to work on this lately).

2 comments

smcin10 months ago
Great work.<p>For some reason, when I gave it a very broad query I got the suggested result &quot;[Table B18104?] Sex by Age by Cognitive Difficulty (Civilian noninstitutionalized population 5 years and over): Total&quot;.<p>No idea why it picked that table. Instead of the more general &quot;[Table B01003]: Total Population&quot; or &quot;[Table B01001] Sex by Age&quot;. In general I think a query&#x27;s first result hit should be the least specific match.<p>And the embeddings&#x2F;full-text-search mishandle things that have no close match: the query &quot;People who look like Kevin Bacon&quot; returns &quot;Number of People: Population by Ancestry: Basque (2022)&quot;
评论 #40985153 未加载
smcin10 months ago
The query engine doesn&#x27;t understand the area of a state&#x2F;county&#x2F;zipcode(&#x2F;census tract), unlke the official Census viewer <a href="https:&#x2F;&#x2F;www.census.gov&#x2F;library&#x2F;visualizations&#x2F;2021&#x2F;geo&#x2F;demographicmapviewer.html" rel="nofollow">https:&#x2F;&#x2F;www.census.gov&#x2F;library&#x2F;visualizations&#x2F;2021&#x2F;geo&#x2F;demog...</a><p>When I query for &quot;population density&quot; I only get tables of total population. Not &quot;people&#x2F;sq mile&quot;.<p>Also, the default legend breaks are exponential (good) but not rounded to nearest n significant figures. And the color scheme is monochome green (hard to quickly read the map).
评论 #41005122 未加载