TechEcho

5 comments

Underneath this is the hard truth that streaming is hard and costly. I've run into many people with optimistic dreams around how to take an SQL or other query languages and then make it update based on data changes rather than periodic polling.These systems are always expensive and yield marginal improvements when compared to a dumb publish subscribe system. I've got notes on publish subscribe systems, and I am publishing a paper next month about the end result system that I built over five years. For a clue about the midpoint, my SRECON talk addresses this: <a href="https://www.usenix.org/conference/srecon17americas/program/presentation/erlich" rel="nofollow">https://www.usenix.org/conference/srecon17americas/program/p...</a>A key note is that publish/subscribe over-commits the problem space (i.e. the problem space can potentially be quadratic depending on use), but if you give up on durability then your costs can go way-way-way down.The lack of durability is a problem, BUT it can be leveraged to discover non-marginal use-cases such that further investments make sense. Unfortunately, once you invest in the imperfect system such that it operates at five+ nines during normal operation, then very few query based systems are competitive.

eventreducealmost 4 years ago

You should check out the event-reduce algorithm[1]. It scales in a different way on how it calculates new results based on old-Results+Event. So it can have some benefits over materialized views depending on the data size and how many clients subscribe to a query.[1] <a href="https://github.com/pubkey/event-reduce" rel="nofollow">https://github.com/pubkey/event-reduce</a>

SpicyLemonZestalmost 4 years ago

Robust operators in particular seem like an important area of research to me. I've never had time to sit down and think about it in detail, but I can't count the number of escalations I've had that would have been instantly solved if only we had a gracefully degrading broadcast join.

评论 #28195215 未加载

senderistaalmost 4 years ago

A while ago, while designing an MPP system for AWS Lambda, I came to the conclusion that in the absence of statistics, merge operators for joins and aggregates have better worst-case behavior than the hash-based alternatives. (See samplesort for a distributed sort algorithm that works well for sorting the merge inputs.)

gopalvalmost 4 years ago

> If we build plans from operators that work well across large ranges of input distributions and that don't have catastrophic failure modes, then we won't be as reliant on accurate statistics.(TL;DR - there's never enough statistics, except what you collect when running that query)The problem with a lot of these approaches is that they suck at the small scale - the sorted merge-join is very safe across different data-sizes, but a hash-join will go through it so fast, it wouldn't make a lot of sense to always opt for it.When I started working on Hive, it was entirely full of "query won't fail, so what if it is slow" joins (or conditional tasks, which are "try to build a hash, if it fails, fail over to a sort-merge" encoded into the plan - that's great when it rarely fails - the config option still bears that scar in hive.auto.convert.join.noconditionaltask=true). There was another pass at this with the Grace hash-join[1] (+ a bloom filter while spilling) to make the hash join safe against overfilling.The problem was that most situations demanded things the other way around, where the choice of tech depends on a smaller prototype and how fast it goes, with the assumption that larger scales would simply be throwing more hardware at it. So people who tried the product thought it was slow and would scale upwards linearly with scale (not that it would be the same latency from 1Gb all the way to 100), so it wouldn't "sell" in the market.Eventually, we do the least optimal way of doing this & just record statistics on a query operators through runtime counters + if it fails, just reuse the row statistics collected at runtime back into the plan to re-execute the query with a plan which includes the updated statistics & take another go at it.This worked better for filter selectivity, particularly where we got NOT IN ('FAILED', 'ABANDONDED') sort of estimates very wrong (a histogram would help, but an HLL+ is useless).Even though this puts a huge penalty on a bad plan, the bad plans are not common enough for the always-robust sort of mechanism to be needed - so by sacrificing the P99 at scale, you get your P95 latencies way down.And overall, being optimistic has worked out better for the whole cluster utilization, because there's a small fraction of giant queries & everyone else with Tableau is happy that everything returns fast.[1] - <a href="https://en.wikipedia.org/wiki/Hash_join#Grace_hash_join" rel="nofollow">https://en.wikipedia.org/wiki/Hash_join#Grace_hash_join</a>

Why query planning for streaming systems is hard

5 comments

Why query planning for streaming systems is hard

5 comments