This is really great! Maybe I'll incorporate this into my own software (scratchdata/scratchdb)<p>Question: it looks like you wrote the parser by hand. How did you decide that that was the right approach? I myself am new to parsers and am working on implementing the PostgREST syntax in go using PEG to translate to Clickhouse, which is to say, a similar mission as this project. Would love to learn how you approached this problem!
It looks a lot like Kusto query language. Here is a kusto query:<p><pre><code> StormEvents
| where StartTime between (datetime(2007-01-01) .. datetime(2007-12-31))
and DamageCrops > 0
| summarize EventCount = count() by bin(StartTime, 7d)
</code></pre>
edit... yes it indeed was inspired by Kusto as they mention on the github Readme <a href="https://github.com/runreveal/pql">https://github.com/runreveal/pql</a>
This is actually pretty awesome! I use KQL every few days for reading some logs from Azure App Insight. The syntax is pretty nice and you can make pretty complex stuff out of it. But that's it, I can't use KQL anywhere else outside Azure. With this, I can show off my KQL-fu to my teammates and surprise them with how fast you can write KQL-like syntax compared to SQL.
Looks similar to PRQL[0].<p>Neither PRQL nor Pql seem to be able to do anything outside of SELECT like Preql[1] can.<p>I propose we call all attempts at transpiling to SQL "quels".<p>[0] <a href="https://prql-lang.org/" rel="nofollow">https://prql-lang.org/</a>
[1] <a href="https://github.com/erezsh/Preql">https://github.com/erezsh/Preql</a>
We're developing TQL (Tenzir Query Language, "tea-quel") that is very similar to PQL: <a href="https://docs.tenzir.com/pipelines" rel="nofollow">https://docs.tenzir.com/pipelines</a><p>Also a pipeline language, PRQL-inspired, but differing in that (i) TQL supports multiple data types between operators, both unstructured blocks of bytes and structured data frames as Arrow record batches, (ii) TQL is multi-schema, i.e., a single pipeline can have different "tables", as if you're processing semi-structured JSON, and (iii) TQL has support for batch and stream processing, with a light-weight indexed storage layer on top of Parquet/Feather files for historical workloads and a streaming executor.
We're in the middle of getting TQL v2 [@] out of the door with support for expressions and more advanced control flow, e.g., match-case statements. There's a blog post [#] about the core design of the engine as well.<p>While it's a general-purpose ETL tool, we're targeting primary operational security use case where people today use Splunk, Sentinel/ADX, Elastic, etc. So some operators are very security'ish, like Sigma, YARA, or Velociraptor.<p>Comparison:<p><pre><code> users
| where eventTime > minus(now(), toIntervalDay(1))
| project user_id, user_email
</code></pre>
vs TQL:<p><pre><code> export
where eventTime > now() - 1d
select user_id, user_email
</code></pre>
[@] <a href="https://github.com/tenzir/tenzir/blob/64ef997d736e9416e859bfcd5f6fa74970565204/rfc/004-query-language/README.md">https://github.com/tenzir/tenzir/blob/64ef997d736e9416e859bf...</a><p>[#] <a href="https://docs.tenzir.com/blog/five-design-principles-for-building-a-data-pipeline-engine" rel="nofollow">https://docs.tenzir.com/blog/five-design-principles-for-buil...</a>
InfluxDB tried to do this with InfluxQL but abandoned it, and are now back to SQL. The biggest problem I had with it when I tried it, was that is was simply too slow, queries were on average 6x slower than their SQL equivalents. I think a language like this is just too hard to optimize well.
This is cool. Splunk Search Processing Language (SPL) is a real vendor lock-in feature. Once the team has invested time to get ramped up on SPL, and it gets integrated in your workflows, ripping out Splunk has an even higher switching cost.
Their first example doesn't look idiomatic at all:<p>SELECT
*
FROM
"users"
WHERE
like ("email", 'gmail')<p>Should “like” here be a user-defined function? Because that’s not the syntax for SQL-like. To which SQL version will Pql translate its queries?
for anyone using anything more than basic SQL functionality so far this looks very limiting. No window functions, no agregate filtering, no data type specific functions (ranges).
How do I get parametered queries into this? can I? should I?<p>edit: guess I can't<p><a href="https://pkg.go.dev/github.com/runreveal/pql" rel="nofollow">https://pkg.go.dev/github.com/runreveal/pql</a>
Is the "piping" associative? As in, does it allow me to put `where eventTime > minus(now(), toIntervalMinute(15)) | count` into a variable so I can use it later on multiple different tables/queries? I remember failing to do the same thing with ggplot2 when I wanted to share styling between components. If the operator is not associative, then the reading order will have to be mixed since composing will require functions (and Go doesn't have pipes/UFCS)
One potential advantage of these compile-to-SQL languages seems to be - they might be easier to tune a codegen LLM on. SQL is very verbose and IME, english -> SQL doesn't really work too well, even in high end products (i.e. pricy SaaS offerings using presumably GPT-4)<p>My hunch is tuning english prompts on less verbose, more "left to right" flowing languages might yield better results.
At this stage I feel that the natural evolution for SQL is instead to use english to describe what you want and have an LLM generate SQL. Often with comments.<p>For some reason, a lot of these SQL alternatives seem to be syntactic preference and not much simpler or clearer than the original.
Their very first example has issues:<p>> users
> | where like(email, 'gmail')
> | count<p>becomes<p>> WITH
> "__subquery0" AS (
> SELECT
> *
> FROM
> "users"
> WHERE
> like ("email", 'gmail')
> )
> SELECT
> COUNT(*) AS "count()"
> FROM
> "__subquery0";<p>Fetching everything from the users table can be a ton slower than just running a count on that table, if the table is indexed on email. I had to deal with that very problem this week.
What target database is that in the examples?<p><pre><code> like ("email", 'gmail')
minus (now (), toIntervalDay (1))
</code></pre>
are non-standard functions/conditions
Maybe I’m totally missing it but why would I use it over sql? All those companies have their own flavor DSL so are you saying this is to standardize using it? Thanks
Pipelining is cool, though this could've easily just been a library with nice chaining and combinators in your language of choice (seems to be Go here).
I’m glad this exists but would caution extensibility as the most important thing for devs to consider when picking there “ORM” stack especially in terse Golang.<p>For that I use squirrel which uses the builder pattern to compose sql strings. Keeping it as strings and interfaces allow it to be very easily extended, for example I was able to customize it to speak SOQL (salesforce). plenty of downide though.
I don't get it. These examples are kinda uninspiring, generated SQL output being unnecessarily complicated doesn't help. I haven't used PRQL, but at least it's pretty obvious from examples, how it's nicer to use than SQL. But this one — yeah, examples on the left are "nicer" than convoluted output on the right, but if you write SQL <i>normally</i>, it's basically just a lot of "|" and table name in the beginning, instead of in the middle. So what's the point?
The only thing I like in SQL is that is almost the same language in decades. Learn it once and you're done. If you really need, you could write macros yourself. I don't see the value of learning a new language to do the same thing
The "Why?" questions are getting downvoted, but to dissect the why section from the page a little...<p>"designed to be small and efficient" – adding a layer on top of SQL is necessarily less efficient, adding a layer of indirection on underlying optimisations means it is likely (but not guaranteed) to also generate less efficient queries.<p>"make developing queries simple" – this seems to be just syntactic preference. The examples are certainly shorter, but in part that's the SQL style used.<p>I think it either needs more evidence that the syntax is actually better or cases it simplifies, the ways in which it can optimise that are hard in SQL, or it perhaps needs to be more honest about the intent of the project being to just be a different interface for those who prefer it that way.<p>It's an interesting exercise, and I'm glad it exists in that respect, and hope the author enjoyed making it and learnt something from it. That can be enough of a why!
This seems really cool. This is not meant to be a negative comment - why does it matter that it's written in Go? It's stated multiple times but could this be written in multiple other languages and still be functionally the same?