科技回声

7 条评论

jerrygenser大约 2 个月前

When I started using bigquery almost 10 years ago, it was obvious from the documentation and running queries in the console that LIMIT does not change the price of the query since it doesn't change the data that needs to be scanned.The user even highlights numerous places in the documentation and hints in the UI which communicate BQ pricing model... so it's interesting they decide to post that it's hidden or dark pattern.Bigquery has a feature in pre-release stage which will be useful for a limit-like experience while incurring much lower cost.<a href="https://cloud.google.com/bigquery/docs/table-sampling" rel="nofollow">https://cloud.google.com/bigquery/docs/table-sampling</a>

willvarfar大约 2 个月前

I have a lot of sympathy for GCP users. There are so many stories of surprise and unpredictable billing.All the cloud providers have foot-guns for the unwary or not-yet-bitten.Although I use GCP all the time, were I to set something up for a friend I would not turn to GCP because of the fear of expensive oopsies.E.g. I wish there were project and per-query options to limit max slot hours and max bytes scanned per query etc.I regularly run really big queries in BQ that can take 10x the slots on some runs just because of 'BQ weather' and slot contention.

bobchadwick大约 2 个月前

As others have pointed out, it's pretty easy to learn that typically using LIMIT doesn't affect query cost. That said, one surprising side effect of adding clustering is that LIMIT actually works as you'd expect it to. See this post by Felipe Hoffa, formerly of Google: <a href="https://hoffa.medium.com/bigquery-optimized-cluster-your-tables-65e2f684594b" rel="nofollow">https://hoffa.medium.com/bigquery-optimized-cluster-your-tab...</a>.

jpau大约 2 个月前

I am grateful for GCP's quotas that help us prevent similar own-goals.While this specific error is something we know to avoid, I'm sure quotas have helped us avoid the pain of other errors. So I'm somewhat sympathetic.I think it's important to read the language of and judgements in the post in the context of someone who just got a large unexpected bill (expensive lesson).

Rockslide大约 2 个月前

I don't have a lot of sympathy for people using their tools wrong. Using partitioning surely would have prevented this.

评论 #43473485 未加载

评论 #43473477 未加载

评论 #43473447 未加载

评论 #43473412 未加载

wodenokoto大约 2 个月前

Big query has an interesting pricing and I do kinda like it. You pay for data in, while processing is basically free.I don’t know the query they used, but limit can limit data scanned.It’s been a long time since I used BQ, but I remember their query optimizer not being particularly advanced, so you had to be really careful where you put the limit.

评论 #43473570 未加载

scottlamb大约 2 个月前

That is a sad story. And I think the referenced doc fragment is even worse than they described; they wrote:> BigQuery charges based on referenced data, not processed data!(emphasis theirs) and linked to a doc that says:> When you run a query, you're charged according to the data processed in the columns you select, even if you set an explicit `LIMIT` on the results.I would have interpreted the latter to mean something else, like that you get charged for scanning all rows when you do something like the following:<pre><code> select a, sum(b) as b_sum from table group by a order by b_sum limit 10; </code></pre> ...because my post-group LIMIT clause doesn't actually prevent it from needing all rows. But their query should genuinely not need all rows. It does need all partitions. I suppose if they have way too many partitions (such that each is <= the minimum fetch size, note: see edit below) then GCP genuinely needs to fetch all the data. Otherwise I am surprised they were charged so much.edit: a caveat on "such that each is <= the minimum fetch size", I suppose their "select *" together with the columnar format might mean that I should word this as something like each (partition, column) is <= the minimum fetch size.

7 条评论

jerrygenser大约 2 个月前

willvarfar大约 2 个月前

bobchadwick大约 2 个月前

jpau大约 2 个月前

Rockslide大约 2 个月前

I don't have a lot of sympathy for people using their tools wrong. Using partitioning surely would have prevented this.

BigQuery pricing model cost us $10k in 22 seconds

7 条评论

BigQuery pricing model cost us $10k in 22 seconds

7 条评论