TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Describe SQL using natural language, and execute against real data

66 点作者 napoleond超过 3 年前
I played around with GPT-3 to build this demo. Select a public BigQuery dataset and describe your query in natural English, then edit the generated SQL as needed and execute it.<p>https:&#x2F;&#x2F;app.tabbydata.com&#x2F;sql-assistant-demo

18 条评论

bob1029超过 3 年前
This demo sent us on a warpath today. We have a fairly clean SQL schema for which we need to craft a lot of queries that handle things like business logic, reporting and configuration.<p>If we could get even 50% success rate on a <i>reasonable starting point</i> for the generated SQL each time, that would be the biggest value-add our organization has ever seen.<p>I think our use case is compelling because we have to implement the same SQL targets for every customer. The only variations are typically customer-specific parameters&#x2F;codes&#x2F;etc.<p>We also have a huge corpus of examples to pull from for training data.<p>We are thinking about initially implementing some higher order views&#x2F;functions in our SQL dialect to make things easier on ourselves with the GPT model. Complex joins across many tables seems to be something that would still elude these techniques. Most of our joins are of a very particular shape, so we can abstract the super nasty stuff away.<p>Worst case scenario, this concludes like my cynical mind assumes it will, but I am open to being surprised this time. We aren&#x27;t going to put everything behind this, more of a &quot;if it works...&quot; kind of 1-2 week experiment.
评论 #29588499 未加载
评论 #29585290 未加载
评论 #29589289 未加载
tillvz超过 3 年前
Hey that&#x27;s very cool and works surprisingly well!<p>At Veezoo (<a href="http:&#x2F;&#x2F;www.veezoo.com" rel="nofollow">http:&#x2F;&#x2F;www.veezoo.com</a>) we have been tackling this problem for over 5 years now.<p>Under the hood we are using our own models. With GPT-3 we&#x27;re a bit worried about the lack of fine-grained control needed for productive use-cases and obv. also lock in.<p>Will try out against the same dataset and see how it compares!
评论 #29583847 未加载
评论 #29583669 未加载
cafed00d超过 3 年前
Wow, this is absolutely brilliant!<p>How can I extend this to use other datasets? There seem to be quite a few interesting publicly available datasets out there: <a href="https:&#x2F;&#x2F;console.cloud.google.com&#x2F;marketplace&#x2F;browse?filter=solution-type:dataset&amp;_ga=2.153373500.522796253.1639693032-620843116.1639693032&amp;pli=1" rel="nofollow">https:&#x2F;&#x2F;console.cloud.google.com&#x2F;marketplace&#x2F;browse?filter=s...</a><p>and I&#x27;m wondering if: 1. Is there a paid-for version of your app&#x2F;website where I can plug in a diff dataset? 2. Have you considered sharing the source code for others to recreate and plug in diff datasets? 3. Or, :sweat_smile:, perhaps this is as simple as adding new datasets to the drop-down menu? say, finance data &#x2F; Bitcoin transactions data?<p>Regardless, really cool app!
评论 #29585263 未加载
mritchie712超过 3 年前
We did a similar analysis a while back, works surprisingly well! I OpenAI increases the amount of &quot;training data&quot; you can send in, I think it could get really good at generating SQL.<p><a href="https:&#x2F;&#x2F;blog.seekwell.io&#x2F;gpt3" rel="nofollow">https:&#x2F;&#x2F;blog.seekwell.io&#x2F;gpt3</a>
评论 #29583236 未加载
earleybird超过 3 年前
May be some rough edges - or I have unreasonable expecations. The weather data set has &#x27;begin&#x27; and &#x27;end&#x27; columns which generates bad sql.<p>ie, &quot;ERROR: Syntax error: Expected end of input but got keyword END at [49:3]&quot;
评论 #29583611 未加载
napoleond超过 3 年前
Clickable link: <a href="https:&#x2F;&#x2F;app.tabbydata.com&#x2F;sql-assistant-demo" rel="nofollow">https:&#x2F;&#x2F;app.tabbydata.com&#x2F;sql-assistant-demo</a>
perl4ever超过 3 年前
<p><pre><code> &#x2F;* Describe the data in the all_geoclustered table. *&#x2F; SELECT COUNT(*) AS total_rows, COUNT(*) FILTER (WHERE date IS NOT NULL) AS total_rows_with_date, COUNT(*) FILTER (WHERE wban IS NOT NULL) AS total_rows_with_wban, COUNT(*) FILTER (WHERE stn IS NOT NULL) AS total_rows_with_stn, COUNT(*) FILTER (WHERE year IS NOT NULL) AS total_rows_with_year, COUNT(*) FILTER (WHERE mo IS NOT NULL) AS total_rows_with_mo, COUNT(*) FILTER (WHERE da IS NOT NULL) AS total_rows_with_da, COUNT(*) FILTER (WHERE temp IS NOT NULL) AS total_rows_with_temp, COUNT(*) FILTER (WHERE count_temp IS NOT NULL) AS total_rows_with_count_temp, COUNT(*) FILTER (WHERE dewp IS NOT NULL) AS total_rows_with_dewp, COUNT(*) FILTER (WHERE count_dewp IS NOT NULL) AS total_rows_with_count_dewp, COUNT(*) FILTER (WHERE slp IS NOT NULL) AS total_rows_with_slp, COUNT(*) FILTER (WHERE count_slp IS NOT NULL) AS total_rows_with_count_slp, COUNT(*) FILTER (WHERE stp IS NOT NULL) AS total_rows_with_stp, COUNT(*) FILTER (WHERE count_stp IS NOT NULL) AS total_rows_with_count_stp, COUNT(*) FILTER (WHERE visib IS NOT NULL) AS total_rows_with_visib, COUNT(*) FILTER (WHERE count_visib IS NOT NULL) AS total_rows_with_count_visib, COUNT(*) FILTER (WHERE wdsp IS NOT NULL) AS total_rows_with_wdsp, COUNT(*) FILTER (WHERE count_wdsp IS NOT NULL) AS total_rows_with_count_wdsp, COUNT(*) FILTER (WHERE mxpsd IS NOT NULL &#x2F;* How often do hail and thunder happen together? *&#x2F; SELECT COUNT(*) AS count, SUM(IF(hail=1 AND thunder=1, 1, 0)) AS sum_hail_thunder FROM `fh-bigquery.weather_gsod.all` WHERE year = 2019 AND mo = 11 AND da = 14 AND stn = &#x27;725030&#x27;</code></pre>
lswainemoore超过 3 年前
&#x2F;* When it rains does it pour? *&#x2F;<p>SELECT date, wban, stn, year, mo, da, temp, count_temp, dewp, count_dewp, slp, count_slp, stp, count_stp, visib, count_visib, wdsp, count_wdsp, mxpsd, gust, max, flag_max, min, flag_min, prcp, flag_prcp, sndp, fog, rain_drizzle, snow_ice_pellets, hail, thunder, tornado_funnel_cloud, usaf, name, country, state, call, lat, lon, elev, begin, end, point_gis, fake_date FROM `fh-bigquery.weather_gsod.all_geoclustered` WHERE lat IS NOT NULL AND lon IS NOT NULL AND lat != 0 AND lon != 0 AND lat != 1 AND lon != 1 AND lat != -1 AND lon != -1 AND lat != 2 AND lon != 2 AND lat != -2 AND lon != -2 AND lat != 3 AND lon != 3 AND lat != -3 AND lon != -3 AND lat != 4 AND lon != 4 AND lat != -4 AND lon != -4 AND lat != 5 AND lon != 5 AND lat != -5 AND lon != -5 AND lat != 6 AND lon != 6 AND lat != -6 AND lon != -6 AND lat != 7 AND lon != 7 AND lat != -7 AND lon != -7 AND lat != 8 AND lon != 8 AND lat != -8 AND lon
perl4ever超过 3 年前
<p><pre><code> &#x2F;* What time of day does hail usually happen? *&#x2F; SELECT date, wban, stn, year, mo, da, temp, count_temp, dewp, count_dewp, slp, count_slp, stp, count_stp, visib, count_visib, wdsp, count_wdsp, mxpsd, gust, max, flag_max, min, flag_min, prcp, flag_prcp, sndp, fog, rain_drizzle, snow_ice_pellets, hail, thunder, tornado_funnel_cloud, usaf, name, country, state, call, lat, lon, elev, begin, end, point_gis, fake_date FROM `fh-bigquery.weather_gsod.all_geoclustered` WHERE hail = 1 ORDER BY date DESC LIMIT 10</code></pre>
easylearnai超过 3 年前
Nice idea, but the default natural language query generated a query that didn&#x27;t run. Perhaps turn down the temperature or tune other parameters. This could be better if you fine-tune it such that anyone could bring their own dataset, with GPT3 just making it available recently. My favorite recent app for no code sql query is trevor.io that has allowed me to build a dashboard so easily. I only knew basic sql query and didn&#x27;t have to tamper with the actual SQL code; it was all interactive and the results is always in the correct format. Comparing to the GPT3 approach, I would say I prefer trevor over using natural language for the precision. The appeal of using GPT3 exists only if it gets high enough accuracy, imo.
i_like_apis超过 3 年前
This is pretty cool.<p>I thought it was funny in that in the weather dataset that &quot;NULL&quot; comes in first for the win for some questions:<p>&gt; what is the all time rainiest city? &gt; what are the top 5 most dry states?<p>The query conversion is impressive!
评论 #29583703 未加载
mjirv超过 3 年前
Very nice! I’ve been working on something similar recently: <a href="https:&#x2F;&#x2F;acolytehq.com" rel="nofollow">https:&#x2F;&#x2F;acolytehq.com</a>.<p>Would be happy to chat and compare notes if you&#x27;d like!
评论 #29583266 未加载
paulfitz超过 3 年前
For google-trends it translated &quot;Is there anything cat-related that people are excited about?&quot; to something very reasonable that worked (answer: popcat apparently). Nice!
评论 #29584764 未加载
talos2110超过 3 年前
Nice!<p>As an interesting test case, check out the very strange and seemingly recursive query generated for “Get the top 10 authors of caching libraries, ranked by commit volume”
评论 #29585212 未加载
331c8c71超过 3 年前
Works unexpectedly well! As many others I am wondering how difficult&#x2F;costly it&#x27;d be to have this assistant set up for another dataset.
评论 #29585231 未加载
perl4ever超过 3 年前
<p><pre><code> &#x2F;* What is the range of elevations where tornadoes happen? *&#x2F; SELECT elev, COUNT(*) AS count FROM `fh-bigquery.weather_gsod.all` WHERE tornado_funnel_cloud = &#x27;T&#x27; GROUP BY elev ORDER BY count DESC</code></pre>
gertjandewilde超过 3 年前
Great use case. Is the demo broken?
matthewmorgan超过 3 年前
Why do programmers seem to love the preposition &#x27;against&#x27;? &#x27;On&#x27; would have been fine