Google's new pipe syntax in SQL

328 pointsby heydenberk9 months ago

50 comments

Richard Hipp, creator of SQLite, has implemented this in an experimental branch: <a href="https://sqlite.org/forum/forumpost/5f218012b6e1a9db" rel="nofollow">https://sqlite.org/forum/forumpost/5f218012b6e1a9db</a>Worth reading the thread, there are some good insights. It looks like he will be waiting on Postgres to take the initiative on implementing this before it makes it into a release.

评论 #41385226 未加载

评论 #41385173 未加载

评论 #41388598 未加载

tehlike9 months ago

LINQ, PRQL, Kusto has all preceeded this.While LINQ is mostly restricted to .NET, PRQL is not. <a href="https://prql-lang.org/" rel="nofollow">https://prql-lang.org/</a>It's a welcome change in the industry.I made this prediction a couple years back: <a href="https://x.com/tehlike/status/1517533067497201666" rel="nofollow">https://x.com/tehlike/status/1517533067497201666</a>

评论 #41385767 未加载

评论 #41385299 未加载

评论 #41389237 未加载

评论 #41388489 未加载

aragonite9 months ago

> This remains a long-standing pet peeve of mine. PDFs like this are horrible to read on mobile phones, hard to copy-and-paste from ...I've never understood why copying text from digitally native PDFs (created directly from digital source files, rather than by OCR-ing scanned images) is so often such a poor experience. Even PDFs produced from LaTex often contain undesirable ligatures in the copied text like ﬁ and ﬂ. Text copied from some Springer journals sometimes lacks space between words or introduces unwanted space between letters in a word ... Is it due to something inherent in PDF technology?

评论 #41386876 未加载

评论 #41387875 未加载

评论 #41386778 未加载

评论 #41386506 未加载

评论 #41386685 未加载

评论 #41387605 未加载

评论 #41389164 未加载

summerlight9 months ago

Previous submissions on the paper itself:<a href="https://news.ycombinator.com/item?id=41321876">https://news.ycombinator.com/item?id=41321876</a> (first) <a href="https://news.ycombinator.com/item?id=41338877">https://news.ycombinator.com/item?id=41338877</a> (plenty of discussions)I tried this new syntax and this seems a reasonable proposal for complex analytical queries. This new syntax probably does not change most simple transactional queries though. The syntax matches the execution semantic more closely, which means you less likely need to formulate query in a weird form to make query planner work as expected; usually users only need to move some pipe operators to more appropriate places.

评论 #41385303 未加载

评论 #41385010 未加载

BeefWellington9 months ago

Every time this FROM-first syntax style crops up it's always the most basic simple query (one table, no projections / subselects / consideration to SP/Views).Just for once I want to see complete examples of the syntax on an actual advanced query of any kind right away. Sure, toss out one simple case, but then show me how it looks when I have to join 4-5 reference tables to a fact table and then filter based on those things.Once you do that, it becomes clear why SELECT first won out originally: legibility and troubleshooting.As long as DBs continue to support standard SQL they can add whatever additional syntax support they want but based on history this'll wind up being a whole new generation of emacs vs vi style holy war.

评论 #41387370 未加载

评论 #41387545 未加载

评论 #41402724 未加载

评论 #41391008 未加载

评论 #41387499 未加载

评论 #41391997 未加载

评论 #41389198 未加载

评论 #41389171 未加载

评论 #41388014 未加载

urbandw311er9 months ago

Title should probably be changed, since the article is about using AI to convert a PDF to semantic HTML.

评论 #41385143 未加载

评论 #41384826 未加载

yarg9 months ago

This reminds me .NET's short lived Linq to SQL;There was a talk at the time, but I can't find the video: <a href="http://jaoo.dk/aarhus2007/presentation/Using+LINQ+to+SQL+to+Access+Relational+Data" rel="nofollow">http://jaoo.dk/aarhus2007/presentation/Using+LINQ+to+SQL+to+...</a>.Basically, it was a way to cleanly plug SQL queries into C# code.It used this sort of ordering (where the constraints come after the thing being constrained); it needed to do so for IntelliSense to work.

评论 #41385345 未加载

评论 #41385270 未加载

评论 #41386689 未加载

mav3ri3k9 months ago

The first piped query language I used was Nushell's implementation of wide-column tables. PRQL offers almost similar approach which I have loved dearly. It also maps to different SQL dialects. There is also proposal to work on type system: <a href="https://github.com/PRQL/prql/issues/381">https://github.com/PRQL/prql/issues/381</a>.Google has now proposed a syntax inspired by these approaches. However, I am afraid how well it would be adopted. As someone new to SQL, nearly every DB seem to provide its own SQL dialect which becomes cumbersome very quickly.Whereas PRQL feels something like Apache Arrow which can map to other dialects.

0xbadcafebee9 months ago

As to the writer's problem with PDFs on the web: they aren't for reactive web app viewing on mobile phones. Not everything has to be. If you reeeeeeeally need to read that research paper, find a screen that's bigger than 3" wide.

评论 #41387808 未加载

评论 #41387261 未加载

slaymaker19079 months ago

I actually work on SQL Server, but I also write a lot of KQL queries which also work this way and I totally agree that the sequential pipe stuff is easier to write. I haven't read through the whole paper, but one aspect that I really like is that I think it's easier to guide the query optimization in this sequential style.

评论 #41385777 未加载

donatj9 months ago

I've been writing SQL for something like 25 years and always thought the columns being SELECTed should have come last, not first. Naming your sources before what you're trying to get from them to me at least makes much more logical sense. Calling aliased table names before I have done the aliasing is weird.Also it would make autocomplete in intelligent IDEs much more helpful when typing a query out from nothing.

victorbjorklund9 months ago

Looks just like writing sql using Ecto in Elixir:"users" |> where([u], u.age > 18) |> select([u], u.name)<a href="https://hexdocs.pm/ecto/Ecto.Query.html" rel="nofollow">https://hexdocs.pm/ecto/Ecto.Query.html</a>

评论 #41388110 未加载

chubot9 months ago

The next thing I would like is to define a function / macro that has a bunch of |> terms.I pointed out that you can do this with shell:Pipelines Support Vectorized, Point-Free, and Imperative Style <a href="https://www.oilshell.org/blog/2017/01/15.html" rel="nofollow">https://www.oilshell.org/blog/2017/01/15.html</a>e.g.<pre><code> hist() { sort | uniq -c | sort -n -r } $ { echo a; echo bb; echo a; } | hist 1 bb 2 a $ foo | hist ... </code></pre> Something like that should be possible in SQL!

评论 #41402766 未加载

wvenable9 months ago

I didn't see this the first time:<pre><code> GROUP AND ORDER BY component_id DESC; </code></pre> Is this kind of syntax combining grouping and ordering really necessary in addition the pipe operator? My advice would be to add the pipe operator and not get fancy adding other syntax to SQL as well.

评论 #41388550 未加载

minkles9 months ago

That is basically R with tidyverse.<pre><code> flights |> filter( carrier == "UA", dest %in% c("IAH", "HOU"), sched_dep_time > 0900, sched_arr_time < 2000 ) |> group_by(flight) |> summarize( delay = mean(arr_delay, na.rm = TRUE), cancelled = sum(is.na(arr_delay)), n = n() ) |> filter(n > 10) </code></pre> If you haven't used R, it has some serious data manipulation legs built into it.

评论 #41388491 未加载

评论 #41389039 未加载

AdieuToLogic9 months ago

If anyone is interested in the theoretical background to the thrush combinator, a.k.a. "|>", here is one using Ruby as the implementation language:<a href="https://leanpub.com/combinators/read#leanpub-auto-the-thrush" rel="nofollow">https://leanpub.com/combinators/read#leanpub-auto-the-thrush</a>Being a concept which transcends programming languages, a search for "thrush combinator" will yield examples in several languages.

评论 #41386449 未加载

Ericson23149 months ago

We should really standardize a core language for SQL. Rust has MIR, Clang is making a CIR for C/C++. Once we have that, we'll be able to to communicate much better.Right now, it's everyone faffing around with different mental models and ugly single pass compilers (my understanding is that parsing-->query planning is not nearly as well-separated in most DBs as parsing-->optomize-->codegen in most compilers).

评论 #41386925 未加载

评论 #41394666 未加载

评论 #41397168 未加载

verdverm9 months ago

The research paper: <a href="https://storage.googleapis.com/gweb-research2023-media/pubtools/1004848.pdf" rel="nofollow">https://storage.googleapis.com/gweb-research2023-media/pubto...</a>

Zopieux9 months ago

I just want trailing commas allowed everywhere. I can't believe this 2024 and we still have to deal with this crap. Humanity deserves better.Syntax/DSL designers: if your language uses a separator for anything, please kindly allow trailing versions of that separator anywhere possible.

themerone9 months ago

My big wish for SQL is for single row inserts to have a {key: value} syntax.

评论 #41386417 未加载

评论 #41386460 未加载

评论 #41386096 未加载

gopiandcode9 months ago

I find this particular choice of syntax somewhat amusing because the pipe notation based query construction was something I ended up using a year ago when making an SQL library in OCaml:<a href="https://github.com/kiranandcode/petrol">https://github.com/kiranandcode/petrol</a>An example query being:```let insert_person ~name:n ~age:a db = Query.insert ~table:example_table ~values:Expr.[ name := s n; age := i a ] |> Request.make_zero |> Petrol.exec db```

KronisLV9 months ago

This feels like this should be in the official SQL standard and supported across a bunch of RDBMSes and understood by IDEs, libraries and frameworks.

评论 #41387984 未加载

middayc9 months ago

Looking at the first example from PDF:<pre><code> FROM customer |> LEFT OUTER JOIN orders ON c_custkey = o_custkey AND o_comment NOT LIKE '%unusual%packages%' |> AGGREGATE COUNT(o_orderkey) c_count GROUP BY c_custkey |> AGGREGATE COUNT(*) AS custdist GROUP BY c_count |> ORDER BY custdist DESC, c_count DESC; </code></pre> You could do something similar with Ryelang's spreadsheet datatype:<pre><code> customers: load\csv %customers.csv orders: load\csv %orders.csv orders .where-not-contains 'o_comment "unusual packages" |left-join customers 'o_custkey 'c_custkey |group-by 'c_custkey { 'c_custkey count } |group-by 'c_custkey_count { 'c_custkey_count count } |order-by 'c_custkey_count_count 'descending </code></pre> Looking at this, maybe we should add an option to name the new aggregate column (now they get named automatically) in group-by function because c_custkey_count_count is not that elegant for example.

rileymat29 months ago

Is there research on what is easier to read when you are sifting through many queries?I like the syntax for reading what the statement expects to output first, even though I agree that I don’t write them select first. I feel like this might be optimizing the wrong thing.Although the example is nice, it does not show 20 tables joined first, which will really muddle it.

评论 #41385666 未加载

delegate9 months ago

There's honeysql library in Clojure, where you define queries as maps, which are then rendered to SQL strings:<pre><code> {:select [:name :age] :from {:people :p} :where [:> :age 10]} </code></pre> Since maps are unordered, this is equivalent to<pre><code> {:from {:people :p} :select [:name :age] :where [:> :age 10]} </code></pre> and also<pre><code> {:where [:> :age 10] :select [:name :age] :from {:people :p}} </code></pre> These can all be rendered to 'SELECT... FROM' or 'FROM .. SELECT'.Queries as data structures are very versatile, since you can use the language constructs to compose them.Queries as strings (FROM-first or not) are still strings which are hard to compose without breaking the syntax.

OptionOfT9 months ago

> GROUP AND ORDER BY component_id DESC;This feels like too much. GROUP BY and ORDER BY are separate clauses, and creating a way to group (heh) them in one clause complicates cognitive load, especially when there is an effort to reduce the overall effort to parse the query in your mind (and to provide a way for an intellisense-like system a way to make better suggestions).<pre><code> GROUP AND ORDER BY x DESC; </code></pre> vs<pre><code> GROUP BY x; ORDER BY x DESC; </code></pre> This long form is 1 word longer, but, it easier to parse in your mind, and doesn't introduce unneeded diffs when changing either the GROUP or the ORDER BY column reference.

isoprophlex9 months ago

I love the idea but something in my brain starts to itch when I see that pipe operator<pre><code> |> </code></pre> What IS that thing? A unix pipe that got confused with a redirect? A weird smiley of a bird wearing sunglasses?It'll take some getting used to, for me...

评论 #41391185 未加载

评论 #41392061 未加载

OscarCunningham9 months ago

> Rationale: We used the same operator name for full-table and grouped aggregation to minimize edit distance between these operations. Unfortunately, this puts the grouping and aggregate columns in different orders in the syntax and output. Putting GROUP BY first would require adding a required keyword before the AGGREGATE list.I think this is bad rationale. Having the columns in order is much more important than having neat syntax for full-table aggregation.

philippta9 months ago

Why even add the pipe operator?If the DB engine is executing the statement out of order, why not allow the statement to be written in any order and let itself figure it out?

评论 #41388533 未加载

评论 #41388143 未加载

julien0409 months ago

I haven't seen it mentioned yet, but it reminds me of PQL (not PRQL): <a href="https://pql.dev" rel="nofollow">https://pql.dev</a>It's inspired by Kusto and available as an open-source CLI. I've made it compatible with SQLite in one of my tools, and it's refreshing to use.An example:<pre><code> StormEvents | where State startswith "W" | summarize Count=count() by State</code></pre>

eezing9 months ago

For autocomplete, FROM first makes a lot of sense. For readability, SELECT first makes more sense because the output is always at the top.

nagisa9 months ago

People here are describing many projects that already have something resembling this syntax and concept, so I'll add another query language to the pile too: Influx's now-mostly-abandoned Flux. Uses the same |> token and structures the query descriptions starting with an equivalent of "FROM".

ahmed_ds9 months ago

This is why I like tools like datastation and hex.tech. You write the initial query using SQL than process the results as a dataframe using Python/pandas. Surely, mixing Pandas and SQL like that is not good for data pipelines but for exploration and analytics, I have found this approach to be enjoyable.

评论 #41390588 未加载

aloukissas9 months ago

This like Elixir's pipe operator [1]! I use it on the daily (with Ecto) and it's epic![1] <a href="https://elixirschool.com/en/lessons/basics/pipe_operator" rel="nofollow">https://elixirschool.com/en/lessons/basics/pipe_operator</a>

stevefan19999 months ago

That's just Linq from C# except Google want to make it a SQL standard...

dang9 months ago

Recent and related:Pipe Syntax in SQL - <a href="https://news.ycombinator.com/item?id=41338877">https://news.ycombinator.com/item?id=41338877</a> - Aug 2024 (219 comments)

datadeft9 months ago

> It's been 50 years. It's time to clean up SQL. ThisIs it though?Are we trying to solve the human SQL parser and generator problem or there is some underlying implementation detail that benefits from pipes?

eternauta3k9 months ago

Do manually-generated SQL strings have a place outside of interactive use? I use them in my small projects but I wonder if a query builder isn't better for larger systems.

评论 #41388004 未加载

oznog9 months ago

SQL replacements is like not understanding the magnitude of the success of something so old.SQL is fine.SQL has been the state of the art for db queries for 40 years.And it will continue to be when we all retire.

jiggawatts9 months ago

They’re a bit late to the game, there’s are least a dozen such popular query languages. LINQ and KQL come to mind, but there are many others…

metadat9 months ago

Simon: Please keep pushing, and mute nothing.

carabiner9 months ago

I like this. Reminds me of pandas.

fridental9 months ago

For the sake of God, please fucking stop inventing new pipe languages.LINQ: existsSplunk query language: existsKQL: existsMongoDB query language: existsPRQL: exists

评论 #41426672 未加载

评论 #41388627 未加载

1024core9 months ago

Isn't this the same syntax (or very similar to) Apache Beam?

notfed9 months ago

Is it just me, or does this seem anachronistic? Like, this is a conversation I expected to blow up 20 years ago. Better late than never.

make39 months ago

this reads like an article written by someone with adhd who started writing about a scientific paper but got distracted by some random thing instead of reading it

评论 #41387271 未加载

rosencrantz9 months ago

int *ptr;// but let's change it to *int ptr;// because the pointer symbol is more logical to write firstPlease can we solve a real problem instead?

jappgar9 months ago

Wait, is this post about SQL or PDF...

thenegation9 months ago

Now wondering if there is any relation to "Structural versus Pipeline Composition of Higher-Order Functions (Experience Report)":<a href="https://cs.brown.edu/~sk/Publications/Papers/Published/rk-struct-pipe-comp-hof/" rel="nofollow">https://cs.brown.edu/~sk/Publications/Papers/Published/rk-st...</a>

sharpshadow9 months ago

I have to honestly say that I like PDFs they always work and don’t fail without JS.