TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A PostgreSQL planner semi-join gotcha with CTE, LIMIT, and RETURNING

57 点作者 namanyayg7 天前

10 条评论

i_have_to_speak6 天前
This really has nothing to do with CTEs.<p>The SQL standard does not specify how many times a subquery must or can be run. In this case, Postgres is free to run the uncorrelated subquery many times. Unfortunately, the subquery result is not idempotent.<p>The real solution is not to rely on the optimizer for the correctness of your query. The query should be logically correct, irrespective of the query plan, right? The correct solution would be to ensure that the subquery runs only once, using a materialized cte or temp table. Do not base your &quot;solution&quot; on the discovery that for this particular version of Postgres, the planner runs this type of subquery once for the &#x27;=&#x27; operator and multiple times for &#x27;IN&#x27;.
评论 #43886652 未加载
impulsivepuppet6 天前
Since I don&#x27;t often write raw SQL, I can only assume the author named their CTE `deleted_tasks` to elucidate that the query might delete multiple items. Otherwise, it makes little sense, for they intended to &quot;pop&quot; a single row, and yet their aptly named `deleted_tasks` ended up removing more than one!<p>The query reads to me like a conceptual mish-mash. Without understanding what the innermost `SELECT` was meant to accomplish, I&#x27;d naturally interpret the `WHERE id IN (...)` as operating on a set. But the most sacrilegious aspect is the inclusion of `FOR UPDATE SKIP LOCKED`. It assumes a very specific execution order that the query syntax doesn&#x27;t actually enforce.<p>Am I right to think that not avoiding lock contention, i.e. omitting `SKIP LOCKED` would have actually produced the intended result?
评论 #43886527 未加载
swid6 天前
This is interesting - I don’t expect undefined behavior like this. Should the query delete more than one row or not?<p>The blog post doesn’t really give answers - it isn’t obvious to me that the second query can’t be executed in the exact same way. The even cop to this fact:<p>&gt; This structure appears less ambiguous to the planner and typically encourages it to evaluate the subquery first.<p>So then - their bug still exists maybe?<p>I have thoughts - probably we do expect it never should delete multiple rows, but after considering the query plan, I can see it being ambiguous. But I would have expected Postgres to use only a single interpretation.
评论 #43884313 未加载
eknkc6 天前
I don’t know about “gotcha”. This sounds like a full blown planner bug to me.
评论 #43884617 未加载
wordofx6 天前
Why was the author not just doing returning *<p>No need for the CTE to select everything…
评论 #43884199 未加载
simonw6 天前
<p><pre><code> DELETE FROM task_queue WHERE id = ( -- Use &#x27;=&#x27; for a single expected ID SELECT id FROM task_queue WHERE queue_group_id = 15 LIMIT 1 FOR UPDATE SKIP LOCKED ) RETURNING item_id; </code></pre> I don&#x27;t understand where that item_id value comes from, since that&#x27;s not a column that&#x27;s mentioned anywhere else in the query.<p>I guess it must be an unmentioned column on that task_queue table?
评论 #43884546 未加载
clhodapp6 天前
Hmm, it seems like the subquery is getting re-run on every single row of the task queue table, which seems like a performance issue in addition to correctness.<p>Personally, if I were going to put any part of that query in a CTE, it would have been (just) the select, as I generally prefer CTE&#x27;s to subqueries.<p>I&#x27;m really not sure what motivated the author to create the CTE in the first place, as the final select seems to be essentially the identity function.
评论 #43885217 未加载
评论 #43885490 未加载
tucnak6 天前
DELETE in SKIP LOCKED scenario is recipe for failure. God&#x2F;Postgres gave you transactions for atomicity... only UPDATE w&#x2F; SKIP LOCKED, holy brother in Christ, and only DELETE completed normally.
评论 #43885342 未加载
abhisek6 天前
Interesting to think about how to guard against these cases where query optimisation leads to unexpected results.<p>I mean this could lead to serious bugs. What can be a way to detect these using linters or in CI before they hit the production.
评论 #43884235 未加载
评论 #43886019 未加载
Horffupolde6 天前
IN is not recommended. Use = ANY instead.
评论 #43884393 未加载