科技回声

6 条评论

I see so many business leaders touting the promise of LLMs allowing business to "talk" to their data. The promise does sound enticing, but it's actually kind of hard to get working in practice.A lot of our databases at work have columns with custom types and enums, and getting the LLM (Llama2) to write SQL queries to robustly answer natural language questions about the data is tough. It requires a lot of instruction prompting, context, and question-SQL examples (few-shot learning), and it still fails in unexpected ways. It's a tough ask for people to use a tool like this if they can't trust the results all the time. It's also a bit infeasible to scale this to tens or hundreds of tables across our data warehouse.It's great that a lot of people are trying to crack this problem, I'm curious to try this model out. I'd also love to see if other people have tried solving this problem and made any headway.

评论 #39133155 未加载

评论 #39133177 未加载

评论 #39132538 未加载

评论 #39136953 未加载

评论 #39132978 未加载

评论 #39132742 未加载

评论 #39133130 未加载

评论 #39136394 未加载

评论 #39134143 未加载

vgt超过 1 年前

Co-founder and Head of Produck at MotherDuck here, happy to answer any questions or go nag the amazing engineers [0] who worked on this :)[0]<a href="https://news.ycombinator.com/user?id=tdoehmen">https://news.ycombinator.com/user?id=tdoehmen</a>

评论 #39133170 未加载

评论 #39132872 未加载

评论 #39134443 未加载

datadrivenangel超过 1 年前

The core issue of text to SQL is that your data has to be good for the generated queries to be correct. The queries may run and return good looking results, but if the data requires domain knowledge ("Don't count people in the customer table without filtering out records with the test flag in the customer attributes table and at least one order in the orders table") you'll get results that don't actually answer your question.

b_mc2超过 1 年前

This is awesome, congratulations. I'm glad to see some text-to-sql models being created. Shameless plug: I also just realized you used NSText2SQL[1] which itself contains my text-to-sql dataset, sql-create-context[2], so I'm honored. I used sqlglot pretty heavily on it as well.Do you think a 3B model might also be in the future, or something small enough that can be loaded up in Transformers.js?[1] <a href="https://huggingface.co/datasets/NumbersStation/NSText2SQL" rel="nofollow">https://huggingface.co/datasets/NumbersStation/NSText2SQL</a>[2] <a href="https://huggingface.co/datasets/b-mc2/sql-create-context" rel="nofollow">https://huggingface.co/datasets/b-mc2/sql-create-context</a>

CastFX超过 1 年前

I'd love to see how it performs in some benchmarks, specifically against Spider (<a href="https://yale-lily.github.io/spider" rel="nofollow">https://yale-lily.github.io/spider</a>) and BIRD (<a href="https://bird-bench.github.io/" rel="nofollow">https://bird-bench.github.io/</a>)

aldarisbm超过 1 年前

looks great, most text-to-sql attempts i’ve tried fall short, hoping this is different

评论 #39132401 未加载

6 条评论

swimwiththebeat超过 1 年前

评论 #39133155 未加载

评论 #39133177 未加载

评论 #39132538 未加载

评论 #39136953 未加载

评论 #39132978 未加载

评论 #39132742 未加载

评论 #39133130 未加载

评论 #39136394 未加载

评论 #39134143 未加载

vgt超过 1 年前

评论 #39133170 未加载

评论 #39132872 未加载

评论 #39134443 未加载

datadrivenangel超过 1 年前

b_mc2超过 1 年前

CastFX超过 1 年前

aldarisbm超过 1 年前

looks great, most text-to-sql attempts i’ve tried fall short, hoping this is different

评论 #39132401 未加载

An open source DuckDB text to SQL LLM

6 条评论

An open source DuckDB text to SQL LLM

6 条评论