TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Querying Postgres Tables Directly from DuckDB

131 点作者 hfmuehleisen超过 2 年前

14 条评论

xnx超过 2 年前
SQLite and DuckDB are excellent demonstrations that the computers we have on our desks (and even in our pockets) are ridiculously fast and capable for almost everything but the largest workloads. They're a stark contrast to framework bloat that has stolen our CPU cycles and given us perceptible lag when typing and scrolling.
评论 #33040799 未加载
eatonphil超过 2 年前
The way they do this read is interesting, and especially why they don&#x27;t just read the files on disk directly.<p>&gt; The Postgres Scanner uses the standard libpq library, which it statically links in. Ironically, this makes the Postgres Scanner easier to install than the other Postgres clients. However, Postgres’ normal client-server protocol is quite slow, so we spent quite some time optimizing this. As a note, DuckDB’s SQLite Scanner does not face this issue, as SQLite is also an in-process database.<p>&gt; We actually implemented a prototype direct reader for Postgres’ database files, but while performance was great, there is the issue that committed but not yet checkpointed data would not be stored in the heap files yet. In addition, if a checkpoint was currently running, our reader would frequently overtake the checkpointer, causing additional inconsistencies. We abandoned that approach since we want to be able to query an actively used Postgres database and believe that consistency is important. Another architectural option would have been to implement a DuckDB Foreign Data Wrapper (FDW) for Postgres similar to duckdb_fdw but while this could improve the protocol situation, deployment of a postgres extension is quite risky on production servers so we expect few people will be able to do so.<p>&gt; Instead, we use the rarely-used binary transfer mode of the Postgres client-server protocol. This format is quite similar to the on-disk representation of Postgres data files and avoids some of the otherwise expensive to-string and from-string conversions. For example, to read a normal int32 from the protocol message, all we need to do is to swap byte order (ntohl).
m_ke超过 2 年前
This might end up being the best way to etl postgres tables to parquet. From everything else that I tried, doing a copy to CSV and then converting to parquet was the fastest but can be a pain when dealing with type conversions.
评论 #33040290 未加载
评论 #33037630 未加载
评论 #33042423 未加载
评论 #33039607 未加载
revision17超过 2 年前
DuckDB is awesome! I find it the easiest way to ingest data from various sources then query it into a form I can do analytics on.<p>The datasets I work on are a bit too big for pandas, but spark is way overkill for them. DuckDB lets me efficiently work on them using only a single computer.
评论 #33038709 未加载
samwillis超过 2 年前
DuckDB is brilliant. Something I’m working I uses SQLite, there is an analytical search users can perform, it scans most of the DB with multiple joins. It would take about 45 seconds to run the query. I exported the dataset and imported it into DuckDB. The literal same SQL query now takes only 1.5 seconds.<p>It really shows the difference in how a column store is so much better for curtain queries.
xwowsersx超过 2 年前
<p><pre><code> D CALL postgres_attach(&#x27;host=blabla.us-east-2.rds.amazonaws.com port=5432 dbname=my_db connect_timeout=10 password=mypass user=my_user&#x27;); Error: Invalid Error: IO Error: Unsupported Postgres type interval</code></pre>
xwowsersx超过 2 年前
Can I expect pretty significant boosts in query speed when using DuckDB on my existing PG database? I&#x27;m very intrigued now that it&#x27;s possible to directly query PG. We don&#x27;t yet have an analytics DB and instead run all analytics stuff on a read replica. Some of the queries are pretty slow. If I can just slap DuckDB on top and speed things up, I&#x27;d be really thrilled.
galaxyLogic超过 2 年前
I wasn&#x27;t aware of DuckDB but now that I read about I wonder should I replace my SQLite drivers with it, since it seems more versatile?
评论 #33037760 未加载
评论 #33041631 未加载
isoprophlex超过 2 年前
Nice job, this will be useful to many, I think. The duckdb sure seems to have a knack for figuring out what their audience needs.
menaerus超过 2 年前
This is exciting but it left me wondering if the approach will remain to scale with larger TPC-H scale factors? Scale factor of 1 is honestly very small.<p>Also, I didn&#x27;t quite understand if DuckDB in order to achieve this must:<p>1. Read the Postgres row formatted data<p>2. Transform the row formatted data into its internal columnar representation<p>3. Keep the representation in memory
评论 #33046472 未加载
latenightcoding超过 2 年前
Wonder if there is anything like this for ClickHouse.<p>edit: to query pg from clickhouse not clickhouse from duckdb
评论 #33040222 未加载
ClassicOrgin超过 2 年前
I was looking into exporting a database from DuckDB to Postgres for use with Google Data Studio, but whenever I was exporting DuckDB was attaching decimals to integers. This was throwing off the entire import into Postgres. Think I’ll just try this out.
xwowsersx超过 2 年前
Crazy idea, but...use Python client in Django app and for some of the larger&#x2F;worst-performing queries, use DuckDB for those. Can even use ORM to construct the queries then get raw sql and execute with Duck. Crazy?
klysm超过 2 年前
Does the inverse of this also exist?
评论 #33037011 未加载
评论 #33036991 未加载