TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Redshift Research Project: Amazon Redshift Serverless [pdf]

4 点作者 Max-Ganz-II超过 1 年前
Redshift Serverless is <i>not</i> serverless. A workgroup is a normal, ordinary Redshift cluster. All workgroups are initially created as a 16 node cluster with 8 slices per node, which is the default 128 RPU workgroup, and then elastic resized to the size specified by the user. This is why the original RPU range is 32 to 512 in units of 8 and the default is 128 RPU; the default is the mid-point of a 4x elastic resize range, and a single node, the smallest possible change in cluster size, is 8 RPU&#x2F;slices. 1 RPU is 1 slice. With elastic resize the number of nodes changes but the number of data slices never changes; rather, the data slices are redistributed over the new nodes, where if the cluster becomes larger, the slice capacity of each node is filled up with compute slices, which are much less capable than data slices, and where if a cluster becomes smaller, the original set of 128 data slices for 16 nodes are crammed into the remaining nodes. Both outcomes are inefficient for compute and storage; a 512 RPU workgroup has 128 data slices and 384 compute slices, rather than 512 data slices, and a 32 RPU workgroup in the worst case for disk use overhead, small tables (~150k rows), consumes 256mb per column, compared to the 64mb of a provisioned 32 slice cluster. The more recently introduced smaller workgroups, 8 to 24 RPU (inclusive both ends) use a 4 slice node and have two nodes for every 8 RPU. In this case, the 8 RPU workgroup is initially a 16 node cluster with 8 slices per node, which is resized to a 2 node cluster with 4 slices per node - a staggering 16x elastic resize; the largest resize permitted to normal users is 4x. An 8 RPU workgroup, with small tables, uses 256mb per column rather than 16mb per column. Workgroups have a fixed number of RPU and require a resize to change this; workgroups do not dynamically auto-scale RPUs. I was unable to prove it, because Serverless is too much of a black box, but I am categorically of the view that the claims made for Serverless for dynamic auto-scaling are made on the basis of the well-known and long-established mechanisms of AutoWLM and Concurrency Scaling Clusters. Finally, it is possible to confidently extrapolate from the ra3.4xlarge and ra3.16xlarge node types a price as they would be in a provisioned cluster for the 8 slice node type, of 6.52 USD per hour. Provisioned clusters charge per node-second, Serverless workgroups charge per node-query-second and so go to zero cost with zero use. On the default Serverless workgroup of 128 RPU&#x2F;16 nodes (avoiding the need to account for the inefficiencies introduced by elastic resize), 10 queries run constantly for one hour (avoiding the need to account for the Serverless minimum query charge of 60 seconds of run-time) costs 460.80 USD. A provisioned cluster composed of the same nodes costs 104.32 USD. The break-even point is 2.26 queries for one hour. Serverless introduces zero usage-zero cost billing, which allows for novel use cases, but this could have perfectly well been obtained by introducing a zero-zero billing model for Provisioned Redshift, without the duplicity, considerable added complexity, end-user confusion, cost in developer time and induced cluster inefficiency involved in the pretence that Serverless is serverless.<p>https:&#x2F;&#x2F;www.redshiftresearchproject.org&#x2F;white_papers&#x2F;downloads&#x2F;serverless.pdf<p>https:&#x2F;&#x2F;www.redshiftresearchproject.org&#x2F;white_papers&#x2F;downloads&#x2F;serverless.html

2 条评论

Max-Ganz-II超过 1 年前
<i>NOTE AND CORRECTION 2023-10-04</i><p>I misunderstood how Serverless billing works. Having read the documentation, I understood - incorrectly - that pricing was <i>per-query</i>. The docs talk about pricing being per RPU-hour, but billed on a per-second basis, where if a query runs for less than 60 seconds, it is billed for 60 seconds, and this is for a serverless product. Therefore pricing is per-query - as it is with Athena, and with Lambda.<p>In fact, it is not.<p>Pricing is per workgroup-second. I still have not found this stated in the docs; I was pointed to a re:Invent talk where an AWS developer presented a slide which made this clear.<p>When I was working on the investigation, I was always running a single query at a time, so billing looked right.<p>This change fundamentally changes the pricing proposition offered by Serverless; the original pricing conclusion was incorrect by an order of magnitude.<p>I have now rewritten the content regarding billing and have republished. The abstract below is the new abstract. The revision history explains what happened. Credits will credit the reader who pointed the issue out, once they let me know if they want a credit or not.
评论 #37776145 未加载
Max-Ganz-II超过 1 年前
An aside : posting about this PDF to r&#x2F;AmazonRedshift, a sub I founded about two years ago, caused what looks like an automated system to ban the sub.<p>No other information is given, other than a ban has occurred, no links or information to routes to appeal, or find out what happened, or why.<p><a href="https:&#x2F;&#x2F;www.redshiftresearchproject.org&#x2F;slblog&#x2F;2023-09.html#reddit_ban" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.redshiftresearchproject.org&#x2F;slblog&#x2F;2023-09.html#...</a><p>(I see now the sub has disappeared from my profile, too. Two years of posts, gone - instantly, no warning, no reason, no information, no notification and no appeal process of any kind, so far as I can see. Reddit appears to be a risky platform to invest time into.)
评论 #37719656 未加载