Did I understand this correctly? You have a single set of items which you query by evaluating a predicate on each of them and then sort the matching ones. After the initial query you update the query result by looking at all the data update events, i.e. you remove delete items from the result, you insert matching new items in the correct position according to the sort order and you insert, remove, or move updated items as they start or stop matching the predicate and change their position according to the sort order.
How does this work for complex queries with sub-queries, LEFT OUTER JOINs, LATERAL JOINs, aggregation (DISTINCT/GROUP BY), window functions, CTEs, RECURSIVE CTEs?<p>I've a radically different approach: can the queries in question as VIEWs, materialize them, use triggers to update materializations where you can write those triggers easily and the updates are quick, or schedule an update where they're not.<p>If your RDBMS is very good about pushing WHERE constraints into VIEWs, and depending on how complex a VIEW query is, you might be able to make the update automatic by just querying the materialization's underlying VIEW with appropriate WHERE constraints from the ROWs being INSERTed/UPDATEd/DELETEd. You can tell which VIEWs might suitable for this by checking that the TABLE whose row the trigger is running for is a "top-level" table source for the VIEW's query: meaning a table source that's either the left side of a top-level LEFT JOIN, or either side of an INNER JOIN. If you can run a query on the VIEW with a timeout then you can just do that in the trigger and mark the materialization as needing an update if the query is too slow. Lastly, a scheduled or NOTIFYed job can run to perform any slower updates to a materialization.
Forgive me if I missed stuff, please point me in the right direction if you've covered it, but some questions if I may (after I've said well done!). I've considered this problem before and it seems very difficult . So:<p>1. Do you have a paper on this with a rigorous justification of the algorithm?<p>2. This surely has to rely on the isolation level being pretty high, or EventReduce might be reading while n other processes are updating. I don't see that mentioned.<p>3. Surely you need logical clocks for this? If not, could you point me to a high-level description of the algorithm to show why they aren't necessary.<p>4. Why does sort order matter? A timestamp yes (see 3. above), but I don't understand why the order matters.<p>thanks (and trying to understand this might be the thing to get me into looking at BDDs again. I never understood their value).
This seems conceptually similar to differential dataflow.<p><a href="https://github.com/timelydataflow/differential-dataflow/blob/master/differentialdataflow.pdf" rel="nofollow">https://github.com/timelydataflow/differential-dataflow/blob...</a>
"EventReduce can be used with relational databases but not on relational queries that run over multiple tables/collections."<p>Forgive my ignorance, but that is the whole point of working with a relational database. If cannot use JOINS then this solves only a very limited use case.
This sounds like (a simpler version of?) Lambda Architecture [1, 2]<p>[1] <a href="https://en.wikipedia.org/wiki/Lambda_architecture" rel="nofollow">https://en.wikipedia.org/wiki/Lambda_architecture</a>
[2] <a href="https://www.manning.com/books/big-data" rel="nofollow">https://www.manning.com/books/big-data</a>
The goal of this is to reduce DB queries? Why not just queue up / batch writes? What benefits does this provide over application side batching of the event.<p>EvenReduce assumes there are no other systems interacting with the DB state (by using the old state that the current system saw). If there are no other systems, simple batching would work fine.
I’m going to look at the code, but how does are transactions in the database handled in eventreduce?<p>Specifically, I’m wondering about isolation levels which determine whether uncommitted changes are queryable before commit/rollback.
IMO An open cursor of Change Stream with Aggregation pipeline (for given use-case) in MongoDB is more flexible solution to achieve this functionality.<p>In addition, it also tracks the history of changes and hence allows the cursor to go back if needed with "resumeToken"<p><a href="https://docs.mongodb.com/manual/changeStreams/" rel="nofollow">https://docs.mongodb.com/manual/changeStreams/</a>
Very cool! This reminds me of some research I did a few years ago on program consolidation: <a href="https://dl.acm.org/doi/10.1145/2594291.2594305" rel="nofollow">https://dl.acm.org/doi/10.1145/2594291.2594305</a>
Databases like PostgreSQL don't offer insights into the query plans, does EventReduce parse the SQL statements to determine which tables and rows will be affected by a query and run the appropriate caching or cache invalidation logic?
Many applications solve this by using memory caching (e.g. Redid, memcached, etc.) of performance sensitive datasets. There are a lot of drawbacks to the approach, to the point that I would avoid it altogether.
Have you compared with an in-memory data store like Redis? Due to the lack of support for joins, that seems like a more natural comparison than A relational database.
Interesting, seems similar to Firebase's Firestore NoSQL database. In that you can create a complex query and receive real-time updates on each query.