SQL pipe syntax available in public preview in BigQuery

247 点作者 marcyb5st3 个月前

35 条评论

The SQL with pipe syntax is also been implemented at Databricks since Jan 30, 2025: <a href="https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-qry-pipeline.html" rel="nofollow">https://docs.databricks.com/en/sql/language-manual/sql-ref-s...</a>Still, the best is yet to come. Previously, SQL extensions were a pain. There was no good place, and table-value functions were a mess.Now, it would be possible to have higher-order functions such as enrichment, predictions, grouping or other data contracts. Example:<pre><code> FROM orders |> WHERE order_date >= '2024-01-01' |> AGGREGATE SUM(order_amount) AS total_spent GROUP BY customer_id |> WHERE total_spent > 1000 |> INNER JOIN customers USING(customer_id) |> CALL ENRICH.APOLLO(EMAIL > customers.email) |> AGGREGATE COUNT(*) high_value_customer GROUP BY company.country </code></pre> This may be called one SQL to determine distinct e-mail domains, then prepare an enriching dataset and later execute the final SQL with JOIN.Iterative SQL with pipes may also work better with GenAI.

评论 #43040953 未加载

评论 #43043083 未加载

评论 #43042541 未加载

willvarfar3 个月前

When the pipe syntax was first published last year the SQLite team did a quick hack to try it out.<a href="https://sqlite.org/forum/forumpost/5f218012b6e1a9db" rel="nofollow">https://sqlite.org/forum/forumpost/5f218012b6e1a9db</a>(Note that the fiddle linked no longer supports the syntax)It is very interesting that they found it completely unnecessary to actually require the pipe characters. The grammar works when the pipe characters are optional.And, imo, looks a lot better!

评论 #43038312 未加载

notpushkin3 个月前

PRQL is a similar idea, and it compiles to SQL: <a href="https://prql-lang.org/" rel="nofollow">https://prql-lang.org/</a><pre><code> from invoices filter invoice_date >= @1970-01-16 derive { transaction_fees = 0.8, income = total - transaction_fees } filter income > 1</code></pre>

评论 #43037569 未加载

评论 #43036490 未加载

评论 #43036570 未加载

评论 #43036295 未加载

评论 #43039628 未加载

评论 #43036279 未加载

评论 #43036897 未加载

评论 #43037829 未加载

jmull3 个月前

I like it, but I'm not sure it's worth the trouble of further expanding an ever-ballooning extended family of SQL syntaxes.Of course, SQL has numerous issues, both in an absolute sense and relative to what we've come to expect. And the obvious course of action in each individual case is to add syntax to support the missing feature.But as you keep adding syntax, SQL gets more and more complex, both in the individual variants and due to the complexities of multiple variants with different syntax support at different times.A transpiling approach makes more sense to me. I'd like the sql implementors to focus on source maps and other things to better support plugging in external, alternate syntaxes (is a standardized mechanism too much to ask for?).Then individual projects/people/efforts can choose the SQL syntax variant that works for them, and it can be one that evolves separate from the host db.

评论 #43037064 未加载

评论 #43036666 未加载

评论 #43036799 未加载

评论 #43037005 未加载

Taikonerd3 个月前

Relatedly, PRQL is a lovely pipe-oriented syntax for SQL DBs: <a href="https://prql-lang.org/" rel="nofollow">https://prql-lang.org/</a>Their syntax is a lot cleaner, because it's a new language -- it's not backwards-compatible with SQL. But then again, they don't have the resources of Google behind them.

评论 #43036112 未加载

nicoritschel3 个月前

Also is in DuckDB <a href="https://github.com/ywelsch/duckdb-psql">https://github.com/ywelsch/duckdb-psql</a>

评论 #43036997 未加载

mccanne3 个月前

Really cool though typing ">" after "|" is a pain <a href="https://github.com/brimdata/super/blob/main/docs/language/pipe-ambiguity.md">https://github.com/brimdata/super/blob/main/docs/language/pi...</a>

评论 #43043275 未加载

Xmd5a3 个月前

<a href="https://github.com/seancorfield/honeysql#vanilla-sql-clause-helpers">https://github.com/seancorfield/honeysql#vanilla-sql-clause-...</a><pre><code> (-> (select :a :b :c) (from :foo) (where [:= :foo.a "baz"])) => {:select [:a :b :c] :from [:foo] :where [:= :foo.a "baz"]} </code></pre> Effort: zero (0). That's what "simple made easy" is about.

code_runner3 个月前

I appreciate that somebody somewhere may appreciate and enjoy this, but I am not that person. I love SQL. I have always loved SQL. I know why others don't, but I do and its beautiful.

评论 #43043190 未加载

评论 #43043130 未加载

mwexler3 个月前

Though not a pipe syntax, the Malloy language has some similar analytic sugar in its syntax.Malloy is from Lloyd Tabb, a co-founder of Looker.<a href="https://www.malloydata.dev/" rel="nofollow">https://www.malloydata.dev/</a>

pjmlp3 个月前

Ever since using Kusto Query Language, that I look forward to SQL getting something like this, maybe there is hope if enough DBs have it as an extension.

评论 #43036224 未加载

ajfriend3 个月前

I have project that's still very much at the experimental stage, where I try to get something similar to this pipe syntax by allowing users to chain "SQL snippets" together. That is, you can use standalone statements like `where col1 > 10` because the `select * from ...` is implied. <a href="https://ajfriend.github.io/duckboat/" rel="nofollow">https://ajfriend.github.io/duckboat/</a><pre><code> import duckboat as uck csv = 'https://raw.githubusercontent.com/allisonhorst/palmerpenguins/main/inst/extdata/penguins.csv' uck.Table(csv).do( "where sex = 'female' ", 'where year > 2008', 'select *, cast(body_mass_g as double) as grams', 'select species, island, avg(grams) as avg_grams group by 1,2', 'select * replace (round(avg_grams, 1) as avg_grams)', 'order by avg_grams', ) </code></pre> I still can't tell if it's too goofy, or if I really like it. :)I write a lot of SQL anyway, so this approach is nice in that I find I almost never need to look up function syntax like I would with Pandas, since it is just using DuckDB SQL under the hood, but removing the need to write `select * from ...` repeatedly. And when you're ready to exit the data exploration phase, its easy to gradually translate things back to "real SQL".The whole project is pretty small, essentially just a light wrapper around DuckDB to do this expression chaining and lazy evaluation.

ralmidani3 个月前

The first time I looked at Elixir, I thought it was hideous (I was really into Python and Ruby, and still admire them while liking Elixir much more). But I kept hearing about the virtues of Elixir (and of the BEAM) and coming back. Finally, it clicked, and I’m now in my 3rd year of doing Elixir full-time. I hope to never again use a language that doesn’t have pipes.The moral of the story? Let’s give this new SQL syntax a chance.

epistasis3 个月前

This is beautiful.I first encountered this style of data manipulation in R's tidyverse. Some say C# (or F#?)had similar pipes and influences, but I haven't seen specifics.Some other data libraries like Polars have sort of similar parse trees, but they also have lots of extra cruft because of limitations to the underlying language. Python's functional calling semantics are extremely powerful, but not quite as powerful as R's.

评论 #43036627 未加载

tdfirth3 个月前

It should have always worked this way. Without this feature you take the algebra out of relational algebra. That's the root of most of the composition issues in SQL.Sadly it's a few decades too late though, and sadly this just fragments the "ecosystem" further.

xiphias23 个月前

One great feature of the SQL pipe syntax is that it works much better together with programming languages (generating, or modifying existing SQL queries).It would be great to have it standardized fast and implemented everywhere.

tqi3 个月前

If this helps people write queries more easily, I think that is unequivocally a great thing and have no reservations.Personally, I continue to prefer CTEs because it allows me to write queries in a more modular way, and in conjunction with descriptive names helps me to keep less context in my head when working out the logic. When I look at a pipe syntax query, it almost feels like the difference between reading a long paragraph vs bullet points?

评论 #43040297 未加载

评论 #43040291 未加载

perdomon3 个月前

This is great. It's a more linear workflow that mimics how we filter data in our minds. How long until it's available in MySQL (never, probably)

ejcx3 个月前

We made pql.dev that works with the different sql syntaxes by translating kusto like queries to sql (using CTE). It's worked really well thusfar and I wish someone would make a standard pipelined query language that gets supported across most databasesI know prql exists, but the syntax is pretty cumbersome and not something I enjoyed writing, but I do understand why folks would gravitate towards it

lordofgibbons3 个月前

I love it. Why did it take us so long to make SQL easier to reason about?Hopefully, it gets adopted in Postgresql too.

dangoodmanUT3 个月前

IMO pipe syntax is so much better. Not necessarily this version, but in general

beoberha3 个月前

Somehow Microsoft (my employer) has totally failed in evangelizing Kusto to the masses. KQL is by far the best query language you’ll ever use.This syntax from Google is nice but it’s still just SQL.

评论 #43040995 未加载

评论 #43046534 未加载

评论 #43040323 未加载

sweeter3 个月前

The Unix pipe paradigm was honestly way ahead of it's time.

fforflo3 个月前

I'm all for such an approach, but I also like Firebolt's approach with Lambda functions focusing on arrays. [0][0] <a href="https://www.firebolt.io/blog/sql-thinking-in-lambdas" rel="nofollow">https://www.firebolt.io/blog/sql-thinking-in-lambdas</a>

Taikonerd3 个月前

This is so exciting. I hope Postgres and MySQL get some level of support for it, too.

schultzer3 个月前

When I first started working in Elixir I loved Ecto and the pipe syntax, but now I want a SQL sigil, that wont care about where I put my where or from clauses.Why are SQL parsers even concerned with that? Isn’t that why we have RD parser.

iddan3 个月前

This is such an improvement. Can’t wait for it to get into Postgres

dnst3 个月前

It seems unfortunate that InfluxDB had the pipe operator with flux in version 2 and dropped it the new version due to low adoption. Now it seems to become more popular.

评论 #43043302 未加载

zendist3 个月前

This reminds me a bit of MSFTs Kusto language. Such an immensely useful way to slice and dice large amounts of structured data.

brikym3 个月前

As a user of Kusto query language in Azure I can highly recommend pipe syntax. The autocomplete is very good.

whalesalad3 个月前

Who invented pipe syntax? Elixir? Or is there another runtime or language that made this prevalent beforehand?

评论 #43035844 未加载

评论 #43035834 未加载

评论 #43038333 未加载

评论 #43035845 未加载

eb0la3 个月前

First thing I thought is "this is a SELECT * in disguise", which is something that you SHOULD never do in BigQuery. If you can combine it with CTEs, seems good because it adds legibility for complex queries. Also looks easier for the planner. Not for the optimizer, though.

etaham3 个月前

KQL has this as well. Quite helpful and can make it easier to show intent.

gigatexal3 个月前

I think the syntax is awesome. Too bad I’m using snowflake at the $DAYJOB

data-ottawa3 个月前

I’ve been using this for a bit more than a week already, I would say this is a great feature for iteration and interactive queries. Here’s my review:Doing data exploration, analysis, and cleaning, this is way more productive than just SQL. As an example, fusing the aggregate functions with the group by keys creates a much faster workflow. I like that it unifies WHERE/HAVING/QUALIFY, and the set/extend/drop functions help (though Snowflake still wins for being able to declare a column and use it in the same select). Ultimately this gives me a one liner for situations where I’m creating hard to name intermediate CTEs, and that’s awesome!Iterating on array columns or parsing non-trivial JSON is much better with this syntax too.This is a shift back to the data frame API of a few years ago, but we’re still missing typing helpers and support that data frame APIs could provide. It would be nice to have a system like this which plugs into language servers, or can list fields at each step with a mouse over/cursor hover, and getting editor completions (the BQ console does an okay job).This syntax is great for DBT macros. You can just drop in entire transforms without worrying about polluting the query namespace or working around existing columns on your reference tables.There’s a dark side to this syntax. The imperative coding style this comes with a tradeoff that the reader needs to track internal state through many steps. It’s the same reason why SELECT * is often not recommended.As a best practice I like to throw a `|> SELECT X,Y,Z` at the end of these pipe blocks to reinforce to the reader what the output looks like.I should mention that it’s not as portable, but frankly all the major DBs aren’t portable, and other DBs also now support this syntax.tl;dr: I like this feature, but use it sparingly. Avoid overuse in model files, definitely lean on it in analyses.

评论 #43042156 未加载

35 条评论

jakozaur3 个月前

评论 #43040953 未加载

评论 #43043083 未加载

评论 #43042541 未加载

willvarfar3 个月前

评论 #43038312 未加载

notpushkin3 个月前

评论 #43037569 未加载

评论 #43036490 未加载

评论 #43036570 未加载

评论 #43036295 未加载

评论 #43039628 未加载

评论 #43036279 未加载

评论 #43036897 未加载

评论 #43037829 未加载

jmull3 个月前

评论 #43037064 未加载

评论 #43036666 未加载

评论 #43036799 未加载

评论 #43037005 未加载

Taikonerd3 个月前

评论 #43036112 未加载

nicoritschel3 个月前

Also is in DuckDB <a href="https://github.com/ywelsch/duckdb-psql">https://github.com/ywelsch/duckdb-psql</a>

评论 #43036997 未加载

mccanne3 个月前

评论 #43043275 未加载

Xmd5a3 个月前

code_runner3 个月前

I appreciate that somebody somewhere may appreciate and enjoy this, but I am not that person. I love SQL. I have always loved SQL. I know why others don't, but I do and its beautiful.

评论 #43043190 未加载

评论 #43043130 未加载

mwexler3 个月前

pjmlp3 个月前

Ever since using Kusto Query Language, that I look forward to SQL getting something like this, maybe there is hope if enough DBs have it as an extension.

评论 #43036224 未加载

ajfriend3 个月前

ralmidani3 个月前

epistasis3 个月前

评论 #43036627 未加载

tdfirth3 个月前

xiphias23 个月前

tqi3 个月前

评论 #43040297 未加载

评论 #43040291 未加载

perdomon3 个月前

This is great. It's a more linear workflow that mimics how we filter data in our minds. How long until it's available in MySQL (never, probably)

ejcx3 个月前

lordofgibbons3 个月前

I love it. Why did it take us so long to make SQL easier to reason about?Hopefully, it gets adopted in Postgresql too.

dangoodmanUT3 个月前

IMO pipe syntax is so much better. Not necessarily this version, but in general

beoberha3 个月前

评论 #43040995 未加载

评论 #43046534 未加载

评论 #43040323 未加载

sweeter3 个月前

The Unix pipe paradigm was honestly way ahead of it's time.

fforflo3 个月前

Taikonerd3 个月前

This is so exciting. I hope Postgres and MySQL get some level of support for it, too.

schultzer3 个月前

iddan3 个月前

This is such an improvement. Can’t wait for it to get into Postgres

dnst3 个月前

It seems unfortunate that InfluxDB had the pipe operator with flux in version 2 and dropped it the new version due to low adoption. Now it seems to become more popular.

评论 #43043302 未加载

zendist3 个月前

This reminds me a bit of MSFTs Kusto language. Such an immensely useful way to slice and dice large amounts of structured data.

brikym3 个月前

As a user of Kusto query language in Azure I can highly recommend pipe syntax. The autocomplete is very good.

whalesalad3 个月前

Who invented pipe syntax? Elixir? Or is there another runtime or language that made this prevalent beforehand?

评论 #43035844 未加载

评论 #43035834 未加载

评论 #43038333 未加载

评论 #43035845 未加载

eb0la3 个月前

etaham3 个月前

KQL has this as well. Quite helpful and can make it easier to show intent.

gigatexal3 个月前

I think the syntax is awesome. Too bad I’m using snowflake at the $DAYJOB

data-ottawa3 个月前

评论 #43042156 未加载