EventReduce: An algorithm to optimize database queries that run multiple times

208 pointsby eventreduceabout 5 years ago

19 comments

danbrucabout 5 years ago

Did I understand this correctly? You have a single set of items which you query by evaluating a predicate on each of them and then sort the matching ones. After the initial query you update the query result by looking at all the data update events, i.e. you remove delete items from the result, you insert matching new items in the correct position according to the sort order and you insert, remove, or move updated items as they start or stop matching the predicate and change their position according to the sort order.

评论 #22888713 未加载

pgtabout 5 years ago

Materialize exists to efficiently solve the view maintenance problem: <a href="https://materialize.io/" rel="nofollow">https://materialize.io/</a>

评论 #22888579 未加载

评论 #22889230 未加载

评论 #22927257 未加载

评论 #22888431 未加载

cryptonectorabout 5 years ago

How does this work for complex queries with sub-queries, LEFT OUTER JOINs, LATERAL JOINs, aggregation (DISTINCT/GROUP BY), window functions, CTEs, RECURSIVE CTEs?I've a radically different approach: can the queries in question as VIEWs, materialize them, use triggers to update materializations where you can write those triggers easily and the updates are quick, or schedule an update where they're not.If your RDBMS is very good about pushing WHERE constraints into VIEWs, and depending on how complex a VIEW query is, you might be able to make the update automatic by just querying the materialization's underlying VIEW with appropriate WHERE constraints from the ROWs being INSERTed/UPDATEd/DELETEd. You can tell which VIEWs might suitable for this by checking that the TABLE whose row the trigger is running for is a "top-level" table source for the VIEW's query: meaning a table source that's either the left side of a top-level LEFT JOIN, or either side of an INNER JOIN. If you can run a query on the VIEW with a timeout then you can just do that in the trigger and mark the materialization as needing an update if the query is too slow. Lastly, a scheduled or NOTIFYed job can run to perform any slower updates to a materialization.

评论 #22897081 未加载

throwaway_pdp09about 5 years ago

Forgive me if I missed stuff, please point me in the right direction if you've covered it, but some questions if I may (after I've said well done!). I've considered this problem before and it seems very difficult . So:1. Do you have a paper on this with a rigorous justification of the algorithm?2. This surely has to rely on the isolation level being pretty high, or EventReduce might be reading while n other processes are updating. I don't see that mentioned.3. Surely you need logical clocks for this? If not, could you point me to a high-level description of the algorithm to show why they aren't necessary.4. Why does sort order matter? A timestamp yes (see 3. above), but I don't understand why the order matters.thanks (and trying to understand this might be the thing to get me into looking at BDDs again. I never understood their value).

评论 #22888460 未加载

telekidabout 5 years ago

This seems conceptually similar to differential dataflow.<a href="https://github.com/timelydataflow/differential-dataflow/blob/master/differentialdataflow.pdf" rel="nofollow">https://github.com/timelydataflow/differential-dataflow/blob...</a>

cntlzwabout 5 years ago

"EventReduce can be used with relational databases but not on relational queries that run over multiple tables/collections."Forgive my ignorance, but that is the whole point of working with a relational database. If cannot use JOINS then this solves only a very limited use case.

评论 #22890312 未加载

评论 #22890178 未加载

joefreemanabout 5 years ago

This sounds like (a simpler version of?) Lambda Architecture [1, 2][1] <a href="https://en.wikipedia.org/wiki/Lambda_architecture" rel="nofollow">https://en.wikipedia.org/wiki/Lambda_architecture</a> [2] <a href="https://www.manning.com/books/big-data" rel="nofollow">https://www.manning.com/books/big-data</a>

评论 #22888676 未加载

评论 #22888535 未加载

Zaheerabout 5 years ago

The goal of this is to reduce DB queries? Why not just queue up / batch writes? What benefits does this provide over application side batching of the event.EvenReduce assumes there are no other systems interacting with the DB state (by using the old state that the current system saw). If there are no other systems, simple batching would work fine.

评论 #22893739 未加载

pfarrellabout 5 years ago

I’m going to look at the code, but how does are transactions in the database handled in eventreduce?Specifically, I’m wondering about isolation levels which determine whether uncommitted changes are queryable before commit/rollback.

truth_seekerabout 5 years ago

IMO An open cursor of Change Stream with Aggregation pipeline (for given use-case) in MongoDB is more flexible solution to achieve this functionality.In addition, it also tracks the history of changes and hence allows the cursor to go back if needed with "resumeToken"<a href="https://docs.mongodb.com/manual/changeStreams/" rel="nofollow">https://docs.mongodb.com/manual/changeStreams/</a>

评论 #22889918 未加载

marceloabsousaabout 5 years ago

Very cool! This reminds me of some research I did a few years ago on program consolidation: <a href="https://dl.acm.org/doi/10.1145/2594291.2594305" rel="nofollow">https://dl.acm.org/doi/10.1145/2594291.2594305</a>

评论 #22890323 未加载

LunaSeaabout 5 years ago

Databases like PostgreSQL don't offer insights into the query plans, does EventReduce parse the SQL statements to determine which tables and rows will be affected by a query and run the appropriate caching or cache invalidation logic?

评论 #22889877 未加载

评论 #22889979 未加载

评论 #22889682 未加载

评论 #22889535 未加载

AmericanBlarneyabout 5 years ago

Many applications solve this by using memory caching (e.g. Redid, memcached, etc.) of performance sensitive datasets. There are a lot of drawbacks to the approach, to the point that I would avoid it altogether.

评论 #22893779 未加载

pfarrellabout 5 years ago

Have you compared with an in-memory data store like Redis? Due to the lack of support for joins, that seems like a more natural comparison than A relational database.

gitgudabout 5 years ago

Interesting, seems similar to Firebase's Firestore NoSQL database. In that you can create a complex query and receive real-time updates on each query.

regularfryabout 5 years ago

So... a write-through cache?

aleccoabout 5 years ago

The use case seems better fit for a streaming database.

评论 #22888728 未加载

dahaunsabout 5 years ago

Sooo...Materialized/Indexed Views?

评论 #22888467 未加载

评论 #22888474 未加载

Asraful56about 5 years ago

Nice

19 comments

danbrucabout 5 years ago

评论 #22888713 未加载

pgtabout 5 years ago

Materialize exists to efficiently solve the view maintenance problem: <a href="https://materialize.io/" rel="nofollow">https://materialize.io/</a>

评论 #22888579 未加载

评论 #22889230 未加载

评论 #22927257 未加载

评论 #22888431 未加载

cryptonectorabout 5 years ago

评论 #22897081 未加载

throwaway_pdp09about 5 years ago

评论 #22888460 未加载

telekidabout 5 years ago

cntlzwabout 5 years ago

评论 #22890312 未加载

评论 #22890178 未加载

joefreemanabout 5 years ago

评论 #22888676 未加载

评论 #22888535 未加载

Zaheerabout 5 years ago

评论 #22893739 未加载

pfarrellabout 5 years ago

truth_seekerabout 5 years ago

评论 #22889918 未加载

marceloabsousaabout 5 years ago

评论 #22890323 未加载

LunaSeaabout 5 years ago

评论 #22889877 未加载

评论 #22889979 未加载

评论 #22889682 未加载

评论 #22889535 未加载

AmericanBlarneyabout 5 years ago

评论 #22893779 未加载

pfarrellabout 5 years ago

Have you compared with an in-memory data store like Redis? Due to the lack of support for joins, that seems like a more natural comparison than A relational database.

gitgudabout 5 years ago

Interesting, seems similar to Firebase's Firestore NoSQL database. In that you can create a complex query and receive real-time updates on each query.

regularfryabout 5 years ago

So... a write-through cache?

aleccoabout 5 years ago

The use case seems better fit for a streaming database.

评论 #22888728 未加载

dahaunsabout 5 years ago

Sooo...Materialized/Indexed Views?

评论 #22888467 未加载

评论 #22888474 未加载

Asraful56about 5 years ago

Nice