TechEcho

6 comments

flakinessalmost 3 years ago

The parser.py [1] has only 1.6k lines. And it is hand-written parser. This size is amazing if it's really capable, but I intuitively doubt it. For example, duckdb's select.y [2] has 3700 lines, and this is only for SELECT. ZetaSQL's grammar file [3] is almost 10k lines.The SQL is a monstrous language. Is there any trick that keeps the code simple?[1] <a href="https://github.com/tobymao/sqlglot/blob/main/sqlglot/parser.py" rel="nofollow">https://github.com/tobymao/sqlglot/blob/main/sqlglot/parser....</a> [2] <a href="https://github.com/duckdb/duckdb/blob/master/third_party/libpg_query/grammar/statements/select.y" rel="nofollow">https://github.com/duckdb/duckdb/blob/master/third_party/lib...</a> [3] <a href="https://github.com/google/zetasql/blob/master/zetasql/parser/bison_parser.y" rel="nofollow">https://github.com/google/zetasql/blob/master/zetasql/parser...</a>

评论 #31988849 未加载

评论 #31985504 未加载

评论 #31986016 未加载

评论 #31985360 未加载

captaintobsalmost 3 years ago

Author here, feel free to ask me any questions!Something that I'm working on is a pure python SQL engine <a href="https://github.com/tobymao/sqlglot/blob/main/sqlglot/executor/python.py" rel="nofollow">https://github.com/tobymao/sqlglot/blob/main/sqlglot/executo...</a>. It does the whole shebang, parsing, optimizations, logical planning, physical execution.

评论 #31983868 未加载

评论 #31985382 未加载

评论 #31986471 未加载

评论 #31985973 未加载

评论 #31984144 未加载

RobinLalmost 3 years ago

SQLGlot is great. We've used it to extend our FOSS probabilistic data linking library[1] so that it is now capable of executing against a variety of SQL backends (Spark, Presto, DuckDB, Sqlite), significantly widening our potential user base.We implement the core statistical model in SQL, and then use SQLGlot to transpile to the target execution engine. One big motivation was to futureproof our work - we're no longer tied down to Spark, and so when the 'next big thing' (GPU accelerated SQL for analytics?) comes along, it should be relatively straightforward to support it by writing another adaptor.Working on this has highlighted some of the really tricky problems associated with translating between SQL engines, and we haven't hit any major problems, so kudos to the author![1] <a href="https://github.com/moj-analytical-services/splink/tree/splink3" rel="nofollow">https://github.com/moj-analytical-services/splink/tree/splin...</a>

评论 #31986447 未加载

eatonphilalmost 3 years ago

Neat! I did an exploration of sql parsers in different languages [0] and couldn't find much for python. But between this project itself and the couple it lists in the benchmarks I have a few more to look at.[0] <a href="https://datastation.multiprocess.io/blog/2022-04-11-sql-parsers.html" rel="nofollow">https://datastation.multiprocess.io/blog/2022-04-11-sql-pars...</a>

评论 #31983430 未加载

评论 #31984223 未加载

评论 #31986397 未加载

Pandabobalmost 3 years ago

Could this be used in VSCode as plugin to autoformat/lint my .sql files?

xiaodaialmost 3 years ago

nice one. do you feel that having it in pure python leaves some performance on the table? or is performance not so critical in this use case?

评论 #31984118 未加载

6 comments

flakinessalmost 3 years ago

评论 #31988849 未加载

评论 #31985504 未加载

评论 #31986016 未加载

评论 #31985360 未加载

captaintobsalmost 3 years ago

评论 #31983868 未加载

评论 #31985382 未加载

评论 #31986471 未加载

评论 #31985973 未加载

评论 #31984144 未加载

RobinLalmost 3 years ago

评论 #31986447 未加载

eatonphilalmost 3 years ago

评论 #31983430 未加载

评论 #31984223 未加载

评论 #31986397 未加载

Pandabobalmost 3 years ago

Could this be used in VSCode as plugin to autoformat/lint my .sql files?

xiaodaialmost 3 years ago

nice one. do you feel that having it in pure python leaves some performance on the table? or is performance not so critical in this use case?

评论 #31984118 未加载

SQLGlot: SQL parser, transpiler, optimizer – translate to Presto, Spark, Hive

6 comments

SQLGlot: SQL parser, transpiler, optimizer – translate to Presto, Spark, Hive

6 comments