Pql, a pipelined query language that compiles to SQL

262 pointsby ejcxabout 1 year ago

36 comments

memsetabout 1 year ago

This is really great! Maybe I'll incorporate this into my own software (scratchdata/scratchdb)Question: it looks like you wrote the parser by hand. How did you decide that that was the right approach? I myself am new to parsers and am working on implementing the PostgREST syntax in go using PEG to translate to Clickhouse, which is to say, a similar mission as this project. Would love to learn how you approached this problem!

评论 #39546064 未加载

brikymabout 1 year ago

It looks a lot like Kusto query language. Here is a kusto query:<pre><code> StormEvents | where StartTime between (datetime(2007-01-01) .. datetime(2007-12-31)) and DamageCrops > 0 | summarize EventCount = count() by bin(StartTime, 7d) </code></pre> edit... yes it indeed was inspired by Kusto as they mention on the github Readme <a href="https://github.com/runreveal/pql">https://github.com/runreveal/pql</a>

评论 #39543615 未加载

darcienabout 1 year ago

This is actually pretty awesome! I use KQL every few days for reading some logs from Azure App Insight. The syntax is pretty nice and you can make pretty complex stuff out of it. But that's it, I can't use KQL anywhere else outside Azure. With this, I can show off my KQL-fu to my teammates and surprise them with how fast you can write KQL-like syntax compared to SQL.

crooked-vabout 1 year ago

What's the reason to go for this over PRQL?

评论 #39543269 未加载

评论 #39543270 未加载

评论 #39542618 未加载

评论 #39541608 未加载

SuaveSteveabout 1 year ago

Looks similar to PRQL[0].Neither PRQL nor Pql seem to be able to do anything outside of SELECT like Preql[1] can.I propose we call all attempts at transpiling to SQL "quels".[0] <a href="https://prql-lang.org/" rel="nofollow">https://prql-lang.org/</a> [1] <a href="https://github.com/erezsh/Preql">https://github.com/erezsh/Preql</a>

mavamabout 1 year ago

We're developing TQL (Tenzir Query Language, "tea-quel") that is very similar to PQL: <a href="https://docs.tenzir.com/pipelines" rel="nofollow">https://docs.tenzir.com/pipelines</a>Also a pipeline language, PRQL-inspired, but differing in that (i) TQL supports multiple data types between operators, both unstructured blocks of bytes and structured data frames as Arrow record batches, (ii) TQL is multi-schema, i.e., a single pipeline can have different "tables", as if you're processing semi-structured JSON, and (iii) TQL has support for batch and stream processing, with a light-weight indexed storage layer on top of Parquet/Feather files for historical workloads and a streaming executor. We're in the middle of getting TQL v2 [@] out of the door with support for expressions and more advanced control flow, e.g., match-case statements. There's a blog post [#] about the core design of the engine as well.While it's a general-purpose ETL tool, we're targeting primary operational security use case where people today use Splunk, Sentinel/ADX, Elastic, etc. So some operators are very security'ish, like Sigma, YARA, or Velociraptor.Comparison:<pre><code> users | where eventTime > minus(now(), toIntervalDay(1)) | project user_id, user_email </code></pre> vs TQL:<pre><code> export where eventTime > now() - 1d select user_id, user_email </code></pre> [@] <a href="https://github.com/tenzir/tenzir/blob/64ef997d736e9416e859bfcd5f6fa74970565204/rfc/004-query-language/README.md">https://github.com/tenzir/tenzir/blob/64ef997d736e9416e859bf...</a>[#] <a href="https://docs.tenzir.com/blog/five-design-principles-for-building-a-data-pipeline-engine" rel="nofollow">https://docs.tenzir.com/blog/five-design-principles-for-buil...</a>

RedShift1about 1 year ago

InfluxDB tried to do this with InfluxQL but abandoned it, and are now back to SQL. The biggest problem I had with it when I tried it, was that is was simply too slow, queries were on average 6x slower than their SQL equivalents. I think a language like this is just too hard to optimize well.

评论 #39544545 未加载

评论 #39555198 未加载

helloericsfabout 1 year ago

Wow, so many query languages, right? Do we really need another one? What's the story behind that decision? Cheers.

评论 #39543190 未加载

评论 #39559795 未加载

smurdaabout 1 year ago

This is cool. Splunk Search Processing Language (SPL) is a real vendor lock-in feature. Once the team has invested time to get ramped up on SPL, and it gets integrated in your workflows, ripping out Splunk has an even higher switching cost.

anotherhueabout 1 year ago

Similarities to PRQL: <a href="https://prql-lang.org/" rel="nofollow">https://prql-lang.org/</a>

评论 #39544657 未加载

sinuhe69about 1 year ago

Their first example doesn't look idiomatic at all:SELECT * FROM "users" WHERE like ("email", 'gmail')Should “like” here be a user-defined function? Because that’s not the syntax for SQL-like. To which SQL version will Pql translate its queries?

评论 #39547755 未加载

vincnetasabout 1 year ago

for anyone using anything more than basic SQL functionality so far this looks very limiting. No window functions, no agregate filtering, no data type specific functions (ranges).

shp0ngleabout 1 year ago

How do I get parametered queries into this? can I? should I?edit: guess I can't<a href="https://pkg.go.dev/github.com/runreveal/pql" rel="nofollow">https://pkg.go.dev/github.com/runreveal/pql</a>

_v7guabout 1 year ago

Is the "piping" associative? As in, does it allow me to put `where eventTime > minus(now(), toIntervalMinute(15)) | count` into a variable so I can use it later on multiple different tables/queries? I remember failing to do the same thing with ggplot2 when I wanted to share styling between components. If the operator is not associative, then the reading order will have to be mixed since composing will require functions (and Go doesn't have pipes/UFCS)

prasoondsabout 1 year ago

One potential advantage of these compile-to-SQL languages seems to be - they might be easier to tune a codegen LLM on. SQL is very verbose and IME, english -> SQL doesn't really work too well, even in high end products (i.e. pricy SaaS offerings using presumably GPT-4)My hunch is tuning english prompts on less verbose, more "left to right" flowing languages might yield better results.

alfalfasproutabout 1 year ago

At this stage I feel that the natural evolution for SQL is instead to use english to describe what you want and have an LLM generate SQL. Often with comments.For some reason, a lot of these SQL alternatives seem to be syntactic preference and not much simpler or clearer than the original.

评论 #39542406 未加载

评论 #39543228 未加载

评论 #39542264 未加载

评论 #39542998 未加载

评论 #39542289 未加载

评论 #39543273 未加载

评论 #39542391 未加载

mcphageabout 1 year ago

Their very first example has issues:> users > | where like(email, 'gmail') > | countbecomes> WITH > "__subquery0" AS ( > SELECT > * > FROM > "users" > WHERE > like ("email", 'gmail') > ) > SELECT > COUNT(*) AS "count()" > FROM > "__subquery0";Fetching everything from the users table can be a ton slower than just running a count on that table, if the table is indexed on email. I had to deal with that very problem this week.

评论 #39544259 未加载

评论 #39544628 未加载

评论 #39544224 未加载

pknerdabout 1 year ago

Sorry I might be dumb but why do we need this?

评论 #39543711 未加载

评论 #39543741 未加载

wingiabout 1 year ago

The first example is worse. Why learning a piping syntax instead learning SQL?

dkgaabout 1 year ago

The R {dbplyr} package is also a very good way in practice to pipe SQL.

评论 #39544573 未加载

hans_castorpabout 1 year ago

What target database is that in the examples?<pre><code> like ("email", 'gmail') minus (now (), toIntervalDay (1)) </code></pre> are non-standard functions/conditions

swmanabout 1 year ago

Maybe I’m totally missing it but why would I use it over sql? All those companies have their own flavor DSL so are you saying this is to standardize using it? Thanks

评论 #39547265 未加载

nitrixabout 1 year ago

Pipelining is cool, though this could've easily just been a library with nice chaining and combinators in your language of choice (seems to be Go here).

评论 #39544682 未加载

jkn23jn23jnabout 1 year ago

So basically the same as C# LINQ feature that allows you to automatically generate SQL and perform database operations without using SQL language.

Cloudefabout 1 year ago

I never liked the original SQL syntax, it's weird. It's one of the few places where I think using s-exprs would've made more sense.

ben_jonesabout 1 year ago

I’m glad this exists but would caution extensibility as the most important thing for devs to consider when picking there “ORM” stack especially in terse Golang.For that I use squirrel which uses the builder pattern to compose sql strings. Keeping it as strings and interfaces allow it to be very easily extended, for example I was able to customize it to speak SOQL (salesforce). plenty of downide though.

outside1234about 1 year ago

Serious question - in this day and age why not just as GPT4 in English to write the SQL you need?

评论 #39550530 未加载

Alifatiskabout 1 year ago

Reminds me of how ActiveRecord works in Rails

krickabout 1 year ago

I don't get it. These examples are kinda uninspiring, generated SQL output being unnecessarily complicated doesn't help. I haven't used PRQL, but at least it's pretty obvious from examples, how it's nicer to use than SQL. But this one — yeah, examples on the left are "nicer" than convoluted output on the right, but if you write SQL normally, it's basically just a lot of "|" and table name in the beginning, instead of in the middle. So what's the point?

neonsunsetabout 1 year ago

Another day, another framework that tries to reinvent a limited subset of C#'s LINQ query syntax :)

SomeoneFromCAabout 1 year ago

SQL in the examples is deliberately convoluted, to make pql look more elegant.

otabdeveloper4about 1 year ago

Not to be confused with Prql.

gtrojaabout 1 year ago

The only thing I like in SQL is that is almost the same language in decades. Learn it once and you're done. If you really need, you could write macros yourself. I don't see the value of learning a new language to do the same thing

评论 #39543219 未加载

评论 #39544936 未加载

评论 #39542761 未加载

评论 #39543350 未加载

danpalmerabout 1 year ago

The "Why?" questions are getting downvoted, but to dissect the why section from the page a little..."designed to be small and efficient" – adding a layer on top of SQL is necessarily less efficient, adding a layer of indirection on underlying optimisations means it is likely (but not guaranteed) to also generate less efficient queries."make developing queries simple" – this seems to be just syntactic preference. The examples are certainly shorter, but in part that's the SQL style used.I think it either needs more evidence that the syntax is actually better or cases it simplifies, the ways in which it can optimise that are hard in SQL, or it perhaps needs to be more honest about the intent of the project being to just be a different interface for those who prefer it that way.It's an interesting exercise, and I'm glad it exists in that respect, and hope the author enjoyed making it and learnt something from it. That can be enough of a why!

评论 #39541802 未加载

评论 #39544020 未加载

评论 #39542330 未加载

评论 #39546028 未加载

评论 #39545110 未加载

jerrygenserabout 1 year ago

This seems really cool. This is not meant to be a negative comment - why does it matter that it's written in Go? It's stated multiple times but could this be written in multiple other languages and still be functionally the same?

评论 #39545567 未加载

xiaodaiabout 1 year ago

Piping the results of each query into text is definitely NOT efficient