> If we build plans from operators that work well across large ranges of input distributions and that don't have catastrophic failure modes, then we won't be as reliant on accurate statistics.<p>(TL;DR - there's never enough statistics, except what you collect when running that query)<p>The problem with a lot of these approaches is that they suck at the small scale - the sorted merge-join is very safe across different data-sizes, but a hash-join will go through it so fast, it wouldn't make a lot of sense to always opt for it.<p>When I started working on Hive, it was entirely full of "query won't fail, so what if it is slow" joins (or conditional tasks, which are "try to build a hash, if it fails, fail over to a sort-merge" encoded into the plan - that's great when it rarely fails - the config option still bears that scar in hive.auto.convert.join.noconditionaltask=true). There was another pass at this with the Grace hash-join[1] (+ a bloom filter while spilling) to make the hash join safe against overfilling.<p>The problem was that most situations demanded things the other way around, where the choice of tech depends on a smaller prototype and how fast it goes, with the assumption that larger scales would simply be throwing more hardware at it. So people who tried the product thought it was slow and would scale upwards linearly with scale (not that it would be the same latency from 1Gb all the way to 100), so it wouldn't "sell" in the market.<p>Eventually, we do the least optimal way of doing this & just record statistics on a query operators through runtime counters + if it fails, just reuse the row statistics collected at runtime back into the plan to re-execute the query with a plan which includes the updated statistics & take another go at it.<p>This worked better for filter selectivity, particularly where we got NOT IN ('FAILED', 'ABANDONDED') sort of estimates very wrong (a histogram would help, but an HLL+ is useless).<p>Even though this puts a huge penalty on a bad plan, the bad plans are not common enough for the always-robust sort of mechanism to be needed - so by sacrificing the P99 at scale, you get your P95 latencies way down.<p>And overall, being optimistic has worked out better for the whole cluster utilization, because there's a small fraction of giant queries & everyone else with Tableau is happy that everything returns fast.<p>[1] - <a href="https://en.wikipedia.org/wiki/Hash_join#Grace_hash_join" rel="nofollow">https://en.wikipedia.org/wiki/Hash_join#Grace_hash_join</a>