TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

CTEs as lookup tables

177 点作者 preetamjinka超过 2 年前

27 条评论

drx超过 2 年前
CTEs are low-key one of the best features of SQL. Great for debugging big queries, such as:<p><pre><code> with source as ( select * from wherever ), transformed as ( ... ), joined as ( ... ), final as ( ... ) select * from final </code></pre> You can switch `final` to `transformed` to see what the query is doing internally. Almost like having good control flow. Almost.
评论 #34347584 未加载
评论 #34347712 未加载
cfeduke超过 2 年前
CTEs (common table expressions) are wonderful, they make SQL grokkable and maintainable. But, before some dev goes crazy refactoring all of the SQL in their codebase into maintainable CTEs: always benchmark. CTEs can cause your query planner to optimize incorrectly. In some cases, CTEs can force the query optimizer to choose a plan it otherwise is not choosing and be more performant - perhaps up to a certain point. (That point being the size of the tables in question&#x2F;index&#x2F;etc.)
评论 #34347248 未加载
评论 #34347853 未加载
评论 #34346971 未加载
评论 #34349408 未加载
评论 #34351824 未加载
srcreigh超过 2 年前
This is a great idea if you disable materialization (which the author of this post does not mention).<p>Yes, materialization is fine for small hardcoded values in the post, but for most other lookup tables, it&#x27;s definitely not fine.<p>For example, you read this article and think great, I&#x27;ll make a lookup table to map various ids to be linked across tables.<p><pre><code> with ids as ( select u.user_id, u.token, s.customer_id from users u join stripe_customer s on s.user_id = u.user_id ), user_impressions as ( select i.*, ids.id from impressions i join ids using token ), payments as ( select p.*, ids.id from payments p join ids using customer_id ), select sum(i.views), sum(i.clicks), sum(p.amount) from payments p join user_impressions i using user_id where user_id = $1 group by date; </code></pre> You think, great, this gets some stats for one user since its filtered at the end, and with indexes on stripe_customer(user_id), user_impressions(token), and payments(customer_id), this will be really fast and efficient!<p>Nope.. since ids is referenced more than once in this query, in Postgres this causes the ids CTE to be materialized onto disk with no indexes. So not only does it take up lots of extra space on disk to store all the users in your system, to join user_impressions and payments you have to O(N) search across the ids dataset. No indexes on a materialized table.<p>It would be a lot faster to join to users or stripe_customer or both in a loop with indexes than O(N) search through all the users in your system.<p>This is particularly dangerous because if you remove the payments part, ids only has one reference, so Postgres doesn&#x27;t materialize the table, and everything is fast.<p>This can be fixed by adding WITH foobar AS NOT MATERIALIZED to the CTE syntax. IMO it should be a syntax error to not specify AS MATERIALIZED or AS NOT MATERIALIZED.. The default has too many potential performance problems and folks should have to think about whether they want materialization or not.
评论 #34348630 未加载
评论 #34349414 未加载
评论 #34350474 未加载
charlie0超过 2 年前
Seems like an interesting idea, but could use a better example, at least for those who aren&#x27;t yet intermediate level in SQL.<p>In what world is this<p><pre><code> WITH countries (code, name) AS ( ...&gt; SELECT \* FROM (VALUES ...&gt; (&#x27;us&#x27;, &#x27;United States&#x27;), (&#x27;fr&#x27;, &#x27;France&#x27;), (&#x27;in&#x27;, &#x27;India&#x27;) ...&gt; ) AS codes ...&gt; ) ...&gt; SELECT data.code, name FROM data LEFT JOIN countries ON countries.code = data.code; </code></pre> easier to read than this<p><pre><code> SELECT code, ...&gt; CASE code ...&gt; WHEN &#x27;us&#x27; THEN &#x27;United States&#x27; ...&gt; WHEN &#x27;fr&#x27; THEN &#x27;France&#x27; ...&gt; WHEN &#x27;in&#x27; THEN &#x27;India&#x27; ...&gt; END AS country ...&gt; FROM data;</code></pre>
评论 #34347529 未加载
评论 #34347335 未加载
评论 #34351766 未加载
评论 #34349618 未加载
acjohnson55超过 2 年前
Occasionally, SQL surprises with bits of composability, such as the fact that using VALUES to specify literal tuples can be used with both INSERT and the FROM clause of a SELECT.<p>Is there any reason syntactically the SELECT needs to be required? If you use a VALUES table literal as a subselect, you have to give it column names with an AS clause (see <a href="https:&#x2F;&#x2F;www.postgresql.org&#x2F;docs&#x2F;current&#x2F;sql-values.html" rel="nofollow">https:&#x2F;&#x2F;www.postgresql.org&#x2F;docs&#x2F;current&#x2F;sql-values.html</a>). I can imagine a simpler syntax where you&#x27;d do WITH VALUES ... AS table_name (column_1_name, ...).<p>Is there any reason to alias the lookup table as `codes`?<p>There are apparently lots of other clever uses of the WITH clause, such as <a href="https:&#x2F;&#x2F;www.postgresql.org&#x2F;docs&#x2F;current&#x2F;queries-with.html#QUERIES-WITH-MODIFYING" rel="nofollow">https:&#x2F;&#x2F;www.postgresql.org&#x2F;docs&#x2F;current&#x2F;queries-with.html#QU...</a>
评论 #34347161 未加载
Izkata超过 2 年前
sqlite doesn&#x27;t have linear regression functions, and doing the math manually is a bit awkward because &quot;b&quot; relies on &quot;m&quot;. Instead of duplicating the math to calculate &quot;b&quot;, here&#x27;s how to do it with CTEs:<p><pre><code> CREATE TABLE vals (x, y); INSERT INTO vals VALUES (1, 1), (2, 0.5), (3, 0.4), (4, 0.1), (5, 0); WITH m(v) AS ( SELECT ((COUNT(*) * (SUM(x * y))) - (SUM(x) * SUM(y))) &#x2F; ((COUNT(*) * SUM(POW(x, 2))) - (POW(SUM(x), 2))) FROM vals ), b(v) AS ( SELECT (SUM(y) - (m.v * SUM(x))) &#x2F; COUNT(*) FROM vals JOIN m ) SELECT x AS real_x, y AS real_y, m.v AS m, b.v AS b, x * m.v + b.v AS interp_y FROM vals JOIN m JOIN b; real_x real_y m b interp_y ------ ------ ----- ---- ------------------- 1 1 -0.24 1.12 0.88 2 0.5 -0.24 1.12 0.64 3 0.4 -0.24 1.12 0.4 4 0.1 -0.24 1.12 0.16 5 0 -0.24 1.12 -0.0800000000000001 </code></pre> If all you want is &quot;m&quot; and &quot;b&quot; the final query can just be &quot;FROM m JOIN b&quot; and the result will be 1 row.
评论 #34348827 未加载
pcblues超过 2 年前
Removing CTEs from your codebase by replacing them with the creation of a temp table and then using separate queries with as few joins as possible to populate it will give you low-locking performance hundreds of times better nearly every time. No problem with readability.
评论 #34347252 未加载
评论 #34347191 未加载
评论 #34354828 未加载
layer8超过 2 年前
In cases like in the example given, I would also consider using a generated column (based on the CASE expression, in the example). That way the CTE doesn’t need to be repeated on each affected query — any query on the table can directly reference the derived value. Of course, this approach is only applicable for row-based values of a single table, and requires you to be able to modify the schema to add the generated column.
vlasky超过 2 年前
This also works in MySQL 8.0.19 and above but the syntax of the CTE example needs to be tweaked to use the ROW() constructor:<p><pre><code> WITH countries (code, name) AS ( SELECT * FROM (VALUES ROW(&#x27;us&#x27;, &#x27;United States&#x27;), ROW(&#x27;fr&#x27;, &#x27;France&#x27;), ROW(&#x27;in&#x27;, &#x27;India&#x27;) ) AS codes ) SELECT data.code, name FROM data LEFT JOIN countries.code = data.code;</code></pre>
zX41ZdbW超过 2 年前
ClickHouse has the `transform` function for this purpose:<p><pre><code> SELECT transform(id, [1, 2, 11], [&#x27;ClickHouse&#x27;, &#x27;Postgres&#x27;, &#x27;MongoDB&#x27;], &#x27;&#x27;) </code></pre> It creates a lookup table. When you are using the CASE operator, in simple cases it will use the `transform` function under the hood.<p>There are more advanced capabilities:<p>- The `Join` table engine. It is a pre-warmed state for joining, a hash table kept in memory.<p>- Dictionaries. Pre-warmed, automatically updated lookup data structures from various sources. For example, you can connect a dictionary of company names from your operational database while keeping only the ids in ClickHouse.<p>And, just in case, the same CTE works as in SQLite:<p><pre><code> WITH countries AS ( SELECT c1 AS code, c2 AS name FROM VALUES( (&#x27;us&#x27;, &#x27;United States&#x27;), (&#x27;fr&#x27;, &#x27;France&#x27;), (&#x27;in&#x27;, &#x27;India&#x27;))) SELECT data.code, name FROM data LEFT JOIN countries ON countries.code = data.code; </code></pre> Example:<p><pre><code> milovidov-desktop :) WITH countries AS ( SELECT c1 AS code, c2 AS name FROM VALUES( (&#x27;us&#x27;, &#x27;United States&#x27;), (&#x27;fr&#x27;, &#x27;France&#x27;), (&#x27;in&#x27;, &#x27;India&#x27;))) SELECT data.code, name FROM (SELECT &#x27;us&#x27; AS code) AS data LEFT JOIN countries ON countries.code = data.code; ┌─code─┬─name──────────┐ │ us │ United States │ └──────┴───────────────┘ </code></pre> Disclaimer: I&#x27;m working on ClickHouse.
msla超过 2 年前
What does CTE stand for in this context?
评论 #34346768 未加载
评论 #34346771 未加载
评论 #34346776 未加载
评论 #34346783 未加载
评论 #34346765 未加载
评论 #34346878 未加载
fphhotchips超过 2 年前
Fun fact, this can also be more performant, depending on the engine.
评论 #34346940 未加载
评论 #34354639 未加载
shybear超过 2 年前
Isn&#x27;t it more readable to make the code snippet with the case when statements into a CTE? I feel like I am misunderstanding the proposed benefits here.
VincentEvans超过 2 年前
For the sake of completeness, you can accomplish the same using UNION instead of VALUES and subquery instead of CTE.<p>‘’’ WITH CTE AS ( SELECT ‘NY’ AS code, ‘New York’ AS state UNION SELECT ‘NJ’, ‘New Jersey’ ) SELECT * FROM CTE ‘’’<p>… likewise you can also do that in a subquery that you can immediately join:<p>‘’’ SELECT * FROM ( SELECT ‘NY’ AS code, ‘New York’ AS state UNION SELECT ‘NJ’, ‘New Jersey’ ) SUB INNER JOIN AnotherTable AT ON SUB.code = AT.code ‘’’
评论 #34347799 未加载
iblaine超过 2 年前
CTEs still give me anxiety due to their inability to be potentially not be optimized. I know many database engines today optimize for this type of thing, I&#x27;m thankful for it, but there was as time 10+ years ago when CTE expressions could take down databases. I also blame DBT for convincing people that CTEs are good. CTEs are only good if used correctly.
评论 #34349020 未加载
JacKTrocinskI超过 2 年前
CTEs are just a nicer, more readable way of writing a subquery. Also, CTEs can be recursive. In Postgres, CTEs can be useful when used with the RETURNING clause. If you&#x27;re using the same CTE in multiple different queries, then consider creating a view instead. I wouldn’t use a CTE for lookup values, create a lookup table instead.
评论 #34352717 未加载
simonw超过 2 年前
This is a neat trick.<p>Worth noting that &quot;Row values were added to SQLite version 3.15.0 (2016-10-14)&quot; - my experience is that older versions of SQLite are a lot more common than I would hope they would be. <a href="https:&#x2F;&#x2F;www.sqlite.org&#x2F;rowvalue.html" rel="nofollow">https:&#x2F;&#x2F;www.sqlite.org&#x2F;rowvalue.html</a>
ergonaught超过 2 年前
Why is this better than just joining an actual table that expands the codes?
评论 #34347272 未加载
评论 #34347082 未加载
jpalomaki超过 2 年前
I like CTEs, but for complicated queries I find it lot easier to make them run fast by splitting into individual queries and creating temporary tables with appropriate indexes.
pophenat超过 2 年前
Well, if you have multiple queries, you&#x27;d still have to write the cte multiple times. You could also use views for the same thing, and those are much more reusable.
评论 #34347355 未加载
galkk超过 2 年前
CTEs are great, but they often cause issues for optimizers.
chaps超过 2 年前
That&#x27;s a really neat trick. Gosh, CTEs are great.
emodendroket超过 2 年前
I&#x27;ve been doing this for ages and am a big evangelist of CTEs. Don&#x27;t just copy and paste the same hairy sub-query over and over!
nbevans超过 2 年前
CTEs are great for breaking down a large query into discrete steps. They&#x27;re basically SQL&#x27;s equivalent of a pipe function.
gunshai超过 2 年前
I loathe using sub queries.<p>I use ctes for everything and do my joins at the end.<p>This seems far more readable.
nurettin超过 2 年前
I do this with lateral joins instead.
rerdavies超过 2 年前
Sucks for localization though.