> If I’d written out the query, I might have seen that I didn’t need the JOIN (or never written it in the first place). Whereas the view hides that complexity. So they can make things easier, but that can lead to performance pitfalls if we’re not careful.<p>you can avoid this particular pitfall by using left joins for views like this (that join stuff in for convenience that you might not select) - postgres will eliminate a left join but not an inner join since the inner join could filter rows (eg symbols in stocks_real_time that don't exist in company (commercial engines will use the presence of fk constraints to know that's impossible and go ahead and eliminate inner joins as well))
Postgres is an amazing database. It’s only significant weakness now is in Materialized views, with their lack of incremental refresh. Was disappointing to see there was no progress towards this in v15.
Enjoyed this post! Djk447 would love to ask a question as well.<p>We've started working with Timescale to process historical time series data. However there is so much of it that we chose which parts we process.<p>It's possible that in the future we may need to go back and reprocess the dataset to add something we decided we want after all.<p>In your post it seems like this is handled automatically on a smaller time scale, meaning you could insert into the past without doing anything special.<p>What happens if you need to insert data at numerous points across two years worth of data instead? Do you have to use backfilling as described in your documentation? Or is it better to maybe rebuild the entire hypertable?
Did you consider a type of continuous aggregate that works over pre-aggregated partitions for time buckets without modifications and swapping those buckets in the continuous aggregate view with a live view when data is modified? I guess it would mean that performance would sort of continually degrade as inserts/updates/deletes happened in more and more time buckets, but as soon as the aggregation event fired performance would recover. It seems like one could provide a more strongly consistent view of aggregates with this sort of approach.
Great approach and good write-up! I’ve implemented a similar technique before on PostgreSQL, but with the materialisation in the application backend. Still works like a charm.<p>>So instead, we created a special kind of trigger that tracks the minimum and maximum times modified across all the rows in a statement and writes out the range of times that were modified to a log table. We call that an invalidation log.<p>Does this invalidation log also take into account cases where the view has an aggregate that is based on data from a bucket other than itself? For example, a lag() or lead() might be used to calculate a delta compared to the previous bucket. Then, if a data point inside bucket 1 is added into the realtime table and bucket 1 is invalidated and re-materialised, for integrity reasons also bucket 2 needs to be re-materialised?
This is an excellent article. I like the way the author builds up in steps to eventual consistency between source tables and their materialized views. It was fun to guess the next step.<p>I do have one question: how does the algorithm described in the article work when the source table is spread across multiple servers, say in multiple shards? Can TimescaleDB maintain materialized views on each shard and then run a query that reconciles them?<p>Edited: clarification