Friendlier SQL with DuckDB

366 pointsby hfmuehleisenabout 3 years ago

30 comments

roncohenabout 3 years ago

Lots of great additions. I will just highlight two:Column selection:When you have tons of columns these become useful. Clickhouse takes it to the next level and supports APPLY and COLUMN in addition to EXCEPT, REPLACE which DuckDB supports:<pre><code> - APPLY: apply a function to a set of columns - COLUMN: select columns by matching a regular expression (!) </code></pre> Details here: <a href="https://clickhouse.com/docs/en/sql-reference/statements/select/#select-modifiers" rel="nofollow">https://clickhouse.com/docs/en/sql-reference/statements/sele...</a>Allow trailing commas:I can't count how many times I've run into a problem with a trailing comma. There's a whole convention developed to overcome this: the prefix comma convention where you'd write:<pre><code> SELECT first_column ,second_column ,third_column </code></pre> which lets you easily comment out a line without worrying about trailing comma errors. That's no longer necessary in DuckDB. Allowing for trailing commas should get included in the SQL spec.

评论 #31355408 未加载

评论 #31355501 未加载

评论 #31357967 未加载

评论 #31360521 未加载

评论 #31359530 未加载

评论 #31355393 未加载

评论 #31363891 未加载

评论 #31363906 未加载

评论 #31360386 未加载

Timpyabout 3 years ago

Wow this was definitely a pessimistic click for me, I was thinking "trying to replace SQL? How stupid!" But it just looks like SQL with all the stuff you wish SQL had, and some more stuff you didn't even know you wanted.

eisabout 3 years ago

I was just yesterday exploring DuckDB and it looked very promising but I was very surprised to find out that indexes are not persisted (and I assume that means they must fit in RAM).> Unique and primary key indexes are rebuilt upon startup, while user-defined indexes are discarded.The second part with just discarding previously defined indexes is super surprising.<a href="https://duckdb.org/docs/sql/indexes" rel="nofollow">https://duckdb.org/docs/sql/indexes</a>This was an instant showstopper for me or I assume most people whose databases grow to a bigger size at which point an OLAP DB becomes interesting in the first place.Also the numerous issues in Github regarding crashes make me hesitant.But I really like the core idea of DuckDB being a very simple codebase with no dependencies and still providing very good performance. I guess I just would like to see more SQLite-esque stability/robustness in the future and I'll surely revisit it at some point.

评论 #31355633 未加载

评论 #31355520 未加载

eyelidlessnessabout 3 years ago

Often efforts and articles like this feel like minor affordances which don’t immediately jump out as a big deal, even if they eventually turn out to be really useful down the road. Seeing the article title, that’s what I expected. I did not expect to read through the whole thing with my inner voice, louder and more enthusiastically, saying “yes! this!” Very cool.

toshabout 3 years ago

How does DuckDB compare to SQLite (e.g. which workloads are a good fit for what? Would it be a good idea to use both?)I found <a href="https://duckdb.org/why_duckdb" rel="nofollow">https://duckdb.org/why_duckdb</a> but I'm sure someone here can share some real world lessons learned?

评论 #31355476 未加载

评论 #31355466 未加载

评论 #31355464 未加载

评论 #31360453 未加载

评论 #31356841 未加载

mattrighettiabout 3 years ago

`EXCLUDE`Extremely useful, is there a reason why this is something not implemented in SQL in the first place? I often find myself writing very long queries just to select basically all columns except for two or three of them.

评论 #31356296 未加载

aerzenabout 3 years ago

If anyone is interested in improvements to SQL, checkout PRQL <a href="https://github.com/prql/prql" rel="nofollow">https://github.com/prql/prql</a>, a pipelined relational query language.It supports:- functions,- using an alias in same `select` that defined it,- trailing commas,- date literals, f-strings and other small improvements we found unpleasant with SQL.<a href="https://lang.prql.builders/introduction.html" rel="nofollow">https://lang.prql.builders/introduction.html</a>The best part: it compiles into SQL. It's under development, though we will soon be releasing version 0.2 which would be "you can check it out"-version.

评论 #31359370 未加载

learndeeplyabout 3 years ago

Since the DuckDB people are here, just want to say that what you're doing is going to be a complete game-changer in the next few years, much like SQLite changed the game. Thanks for making it open source!

wolf550eabout 3 years ago

That 750KB PNG can probably be a 50KB PNG. Even without resizing it compresses to less than half its size.<a href="https://duckdb.org/images/blog/duck_chewbacca.png" rel="nofollow">https://duckdb.org/images/blog/duck_chewbacca.png</a>

评论 #31356057 未加载

Oxodaoabout 3 years ago

Came across this a few time but never got to try it out because the only golang binding is unofficial and I can't get CGO to work as expected...That would be really neat to have an official one. This articles makes me want to try it even more

评论 #31356246 未加载

评论 #31356911 未加载

_raoulcousinsabout 3 years ago

I love love love DuckDB. When I can use DuckDB + pyarrow and not import pandas, it makes my day.

kristianpabout 3 years ago

I'm enjoying experimenting with Duckdb from python, it's a promising product and has a large list of data formats it can read, including pandas dataframes from in-memory with zero-copy. However its still quite the moving target, with a number of things not at maturity yet. e.g. the TimestampZ column type isn't implemented yet [1], although it is in the documentation.Edit: I came across it via the podcast: <a href="https://www.dataengineeringpodcast.com/duckdb-in-process-olap-database-episode-270/" rel="nofollow">https://www.dataengineeringpodcast.com/duckdb-in-process-ola...</a>Latest release notes: <a href="https://github.com/duckdb/duckdb/releases/tag/v0.3.3" rel="nofollow">https://github.com/duckdb/duckdb/releases/tag/v0.3.3</a>[1] Error message: Not implemented Error: DataType TIMESTAMPZ not supported yet...

评论 #31363517 未加载

wencabout 3 years ago

This is fantastic. Column aliases are super helpful in reducing verbose messiness.DuckDB has all but replaced Pandas for my use cases. It’s much faster than Pandas even when working with Pandas data frames. I “import duckdb as db” more than I “import pandas as pd” these days.The only thing I need now is a parallelized APPLY syntax in DuckDB.

评论 #31355869 未加载

fijiaaroneabout 3 years ago

<pre><code> SELECT * EXCLUDE (jar_jar_binks, midichlorians) FROM star_wars </code></pre> Error: columns not foundOn further investigation, It seems that someone had maliciously injected lots of bogus data into the production database. We tried to clean up by truncating tables and dropping columns, but in the end it was easier to just restore from backup prior to 1999.There still seems to be some residual corruption, most predominantly around mos_eisley and jabbas_palace data, and we had to truncate the end of Return of the Jedi, but not much was lost there.

chrisjcabout 3 years ago

What are some potential long-term liabilities we might see in choosing to adopt duckdb today?Obviously there will be a desire to monetize this project, if not for the very simple reason of subsidizing the cost of its development and maintenance. I love everything I hear and see about this project, but it makes me nervous to recommend this internally due to it not only being in such an early stage, but also bc of any unforeseen costs and liabilities that it might introduce in the future.

评论 #31355658 未加载

projektfuabout 3 years ago

I find that the examples are very confusing because they are using names that sound like rows or tables (jar_jar_binks, planets) as fields in the examples.

评论 #31356048 未加载

tlnabout 3 years ago

These are great features! I wish I had them in every database. Hmm, I wonder if Babelfish could support that...PS those examples were so good! really good writing :)

flakinessabout 3 years ago

I love the attitude towards ergonomics over standard compliance. And you'll see why SQL has never been really portable across databases ;-)

评论 #31357573 未加载

评论 #31357035 未加载

carlinengabout 3 years ago

I love these updates. It would be great to see some of the major data warehouse vendors (Snowflake, BigQuery, Redshift) follow suit.

foxbeeabout 3 years ago

This is awesome and would love to chat around building an integration to the low-code platform Budibase: <a href="https://github.com/Budibase/budibase" rel="nofollow">https://github.com/Budibase/budibase</a>

getraviabout 3 years ago

I would go even further and say that "GROUP BY ALL" and "ORDER BY ALL" should be implied if not provided in the query.EDIT: Typo

评论 #31363790 未加载

评论 #31362860 未加载

ashesabout 3 years ago

I've been experimenting with DuckDB using modified Mondrian OLAP engine and it looks very promising so far, performance wise.A questions I have to author, or anyone using: Is there a easy way to transfer whole Postgres DB into DuckDB so I can do some tests with actual client data? I could export each table by hand and reimport it, but that is kind of painful.

评论 #31367616 未加载

nojvekabout 3 years ago

Trailing commas and “GROUP BY ALL” is such a huge improvement. Some of this should start making it to other databases.

parenthesesabout 3 years ago

Though many of the queries don’t make complete sense the mapping of queries to Star Wars is :chefkiss:

yargabout 3 years ago

On the topic of friendlier SQL, there was a feature LINQ to SQL added to (and I believe removed from) .NetIt was basically syntactic sugar for a persistence API.Instead of "select bar from foo" it used a "from foo select bar" type of syntax.This was rather nice from a code completion perspective.

padmiabout 3 years ago

Friendlier sql is MySQL "insert into set".Normal insert, hard to read:INSERT INTO table1 ( field1, field2, field3, ... ) VALUES ('value1', 'value2', 'value3', ... );vsEasier to read:INSERT INTO table1 SET field1='value1', field2='value2', field3='value3', ...

zasdffaaabout 3 years ago

This looks a little odd<pre><code> SELECT age, sum(civility) as total_civility FROM star_wars_universe ORDER BY ALL -- ORDER BY age, total_civility </code></pre> there's no GROUP BY?edit: (removed edit, I blew it, sorry)

评论 #31360558 未加载

TedDallasabout 3 years ago

Does it support a syntax for recursive queries? In T-SQL we use recursive CTEs which are ugly as hell.This is very cool though. There are lot of features that would make my life easier. Group By All is noice.

评论 #31363264 未加载

评论 #31364371 未加载

diogofrancoabout 3 years ago

Interesting additions! On using column aliases in predicates, what if my alias exists in the source as well, what takes precedence? I feel like this can become a bit confusing either way.

评论 #31363829 未加载

pxtailabout 3 years ago

Wow so many nice database-related news recently - feels like database week or something! :)