That is a sad story. And I think the referenced doc fragment is even worse than they described; they wrote:<p>> BigQuery <i>charges based on referenced data,</i> not processed data!<p>(emphasis theirs) and linked to a doc that says:<p>> When you run a query, you're charged according to the data processed in the columns you select, even if you set an explicit `LIMIT` on the results.<p>I would have interpreted the latter to mean something else, like that you get charged for scanning all rows when you do something like the following:<p><pre><code> select a, sum(b) as b_sum from table group by a order by b_sum limit 10;
</code></pre>
...because my post-group LIMIT clause doesn't actually prevent it from needing all rows. But their query should genuinely not need all rows. It does need all partitions. I suppose if they have way too many partitions (such that each is <= the minimum fetch size, note: see edit below) then GCP genuinely needs to fetch all the data. Otherwise I am surprised they were charged so much.<p>edit: a caveat on "such that each is <= the minimum fetch size", I suppose their "select *" together with the columnar format might mean that I should word this as something like each (partition, column) is <= the minimum fetch size.