Lots of great additions. I will just highlight two:<p><i>Column selection</i>:<p>When you have tons of columns these become useful. Clickhouse takes it to the next level and supports APPLY and COLUMN in addition to EXCEPT, REPLACE which DuckDB supports:<p><pre><code> - APPLY: apply a function to a set of columns
- COLUMN: select columns by matching a regular expression (!)
</code></pre>
Details here: <a href="https://clickhouse.com/docs/en/sql-reference/statements/select/#select-modifiers" rel="nofollow">https://clickhouse.com/docs/en/sql-reference/statements/sele...</a><p><i>Allow trailing commas</i>:<p>I can't count how many times I've run into a problem with a trailing comma. There's a whole convention developed to overcome this: the prefix comma convention where you'd write:<p><pre><code> SELECT
first_column
,second_column
,third_column
</code></pre>
which lets you easily comment out a line without worrying about trailing comma errors. That's no longer necessary in DuckDB. Allowing for trailing commas should get included in the SQL spec.
Wow this was definitely a pessimistic click for me, I was thinking "trying to replace SQL? How stupid!" But it just looks like SQL with all the stuff you wish SQL had, and some more stuff you didn't even know you wanted.
I was just yesterday exploring DuckDB and it looked very promising but I was very surprised to find out that indexes are not persisted (and I assume that means they must fit in RAM).<p>> Unique and primary key indexes are rebuilt upon startup, while user-defined indexes are discarded.<p>The second part with just discarding previously defined indexes is super surprising.<p><a href="https://duckdb.org/docs/sql/indexes" rel="nofollow">https://duckdb.org/docs/sql/indexes</a><p>This was an instant showstopper for me or I assume most people whose databases grow to a bigger size at which point an OLAP DB becomes interesting in the first place.<p>Also the numerous issues in Github regarding crashes make me hesitant.<p>But I really like the core idea of DuckDB being a very simple codebase with no dependencies and still providing very good performance. I guess I just would like to see more SQLite-esque stability/robustness in the future and I'll surely revisit it at some point.
Often efforts and articles like this feel like minor affordances which don’t immediately jump out as a big deal, even if they eventually turn out to be really useful down the road. Seeing the article title, that’s what I expected. I did not expect to read through the whole thing with my inner voice, louder and more enthusiastically, saying “yes! this!” Very cool.
How does DuckDB compare to SQLite (e.g. which workloads are a good fit for what? Would it be a good idea to use both?)<p>I found <a href="https://duckdb.org/why_duckdb" rel="nofollow">https://duckdb.org/why_duckdb</a> but I'm sure someone here can share some real world lessons learned?
`EXCLUDE`<p>Extremely useful, is there a reason why this is something not implemented in SQL in the first place? I often find myself writing very long queries just to select basically all columns except for two or three of them.
If anyone is interested in improvements to SQL, checkout PRQL <a href="https://github.com/prql/prql" rel="nofollow">https://github.com/prql/prql</a>, a pipelined relational query language.<p>It supports:<p>- functions,<p>- using an alias in same `select` that defined it,<p>- trailing commas,<p>- date literals, f-strings and other small improvements we found unpleasant with SQL.<p><a href="https://lang.prql.builders/introduction.html" rel="nofollow">https://lang.prql.builders/introduction.html</a><p>The best part: it compiles into SQL. It's under development, though we will soon be releasing version 0.2 which would be "you can check it out"-version.
Since the DuckDB people are here, just want to say that what you're doing is going to be a complete game-changer in the next few years, much like SQLite changed the game. Thanks for making it open source!
That 750KB PNG can probably be a 50KB PNG. Even without resizing it compresses to less than half its size.<p><a href="https://duckdb.org/images/blog/duck_chewbacca.png" rel="nofollow">https://duckdb.org/images/blog/duck_chewbacca.png</a>
Came across this a few time but never got to try it out because the only golang binding is unofficial and I can't get CGO to work as expected...<p>That would be really neat to have an official one. This articles makes me want to try it even more
I'm enjoying experimenting with Duckdb from python, it's a promising product and has a large list of data formats it can read, including pandas dataframes from in-memory with zero-copy. However its still quite the moving target, with a number of things not at maturity yet. e.g. the TimestampZ column type isn't implemented yet [1], although it is in the documentation.<p>Edit: I came across it via the podcast: <a href="https://www.dataengineeringpodcast.com/duckdb-in-process-olap-database-episode-270/" rel="nofollow">https://www.dataengineeringpodcast.com/duckdb-in-process-ola...</a><p>Latest release notes:
<a href="https://github.com/duckdb/duckdb/releases/tag/v0.3.3" rel="nofollow">https://github.com/duckdb/duckdb/releases/tag/v0.3.3</a><p>[1] Error message: Not implemented Error: DataType TIMESTAMPZ not supported yet...
This is fantastic. Column aliases are super helpful in reducing verbose messiness.<p>DuckDB has all but replaced Pandas for my use cases. It’s much faster than Pandas even when working with Pandas data frames. I “import duckdb as db” more than I “import pandas as pd” these days.<p>The only thing I need now is a parallelized APPLY syntax in DuckDB.
<p><pre><code> SELECT * EXCLUDE (jar_jar_binks, midichlorians) FROM star_wars
</code></pre>
Error: columns not found<p>On further investigation, It seems that someone had maliciously injected lots of bogus data into the production database. We tried to clean up by truncating tables and dropping columns, but in the end it was easier to just restore from backup prior to 1999.<p>There still seems to be some residual corruption, most predominantly around mos_eisley and jabbas_palace data, and we had to truncate the end of Return of the Jedi, but not much was lost there.
What are some potential long-term liabilities we might see in choosing to adopt duckdb today?<p>Obviously there will be a desire to monetize this project, if not for the very simple reason of subsidizing the cost of its development and maintenance. I love everything I hear and see about this project, but it makes me nervous to recommend this internally due to it not only being in such an early stage, but also bc of any unforeseen costs and liabilities that it might introduce in the future.
I find that the examples are very confusing because they are using names that sound like rows or tables (jar_jar_binks, planets) as fields in the examples.
These are great features! I wish I had them in every database.
Hmm, I wonder if Babelfish could support that...<p>PS those examples were so good! really good writing :)
This is awesome and would love to chat around building an integration to the low-code platform Budibase:
<a href="https://github.com/Budibase/budibase" rel="nofollow">https://github.com/Budibase/budibase</a>
I would go even further and say that "GROUP BY ALL" and "ORDER BY ALL" should be implied if not provided in the query.<p>EDIT: Typo
I've been experimenting with DuckDB using modified Mondrian OLAP engine and it looks very promising so far, performance wise.<p>A questions I have to author, or anyone using: Is there a easy way to transfer whole Postgres DB into DuckDB so I can do some tests with actual client data? I could export each table by hand and reimport it, but that is kind of painful.
On the topic of friendlier SQL, there was a feature LINQ to SQL added to (and I believe removed from) .Net<p>It was basically syntactic sugar for a persistence API.<p>Instead of "select bar from foo" it used a "from foo select bar" type of syntax.<p>This was rather nice from a code completion perspective.
Friendlier sql is MySQL "insert into set".<p>Normal insert, hard to read:<p>INSERT INTO table1 ( field1, field2, field3, ... ) VALUES ('value1', 'value2', 'value3', ... );<p>vs<p>Easier to read:<p>INSERT INTO table1 SET field1='value1', field2='value2', field3='value3', ...
This looks a little odd<p><pre><code> SELECT age, sum(civility) as total_civility
FROM star_wars_universe
ORDER BY ALL
-- ORDER BY age, total_civility
</code></pre>
there's no GROUP BY?<p>edit: (removed edit, I blew it, sorry)
Does it support a syntax for recursive queries? In T-SQL we use recursive CTEs which are ugly as hell.<p>This is very cool though. There are lot of features that would make my life easier. Group By All is noice.
Interesting additions! On using column aliases in predicates, what if my alias exists in the source as well, what takes precedence? I feel like this can become a bit confusing either way.