I didn't really understand what the product actually did after reading this blog post or the products page. I found the docs much more edifying:<p>> Materialize lets you ask questions about your data, and then get the answers in real time.<p>> Why not just use your database’s built-in functionality to perform these same computations? Because your database often acts as if it’s never been asked that question before, which means it can take a long time to come up with an answer, each and every time you pose the query.<p>> Materialize instead keeps the results of the queries and incrementally updates them as new data comes in. So, rather than recalculating the answer each time it’s asked, Materialize continually updates the answer and gives you the answer’s current state from memory.<p>> Importantly, Materialize supports incrementally updating a much broader set of views than is common in traditional databases (e.g. views over multi-way joins with complex aggregations), and can do incremental updates in the presence of arbitrary inserts, updates, and deletes in the input streams.<p><a href="https://materialize.io/docs/" rel="nofollow">https://materialize.io/docs/</a>
> We believe that streaming architectures are the only ones that can produce this ideal data infrastructure.<p>I just want to say this is a very dangerous assumption to make.<p>I run a company that helps our customers consolidate and transform data from virtually anywhere in their data warehouses. When we first started, the engineer in me made the same declaration, and I worked to get data into warehouses seconds after and event or record was generated in an origin system (website, app, database, salesforce, etc).<p>What I quickly learned was that analysts and data scientists simply didn't want or need this. Refreshing the data every five minutes in batches was more than sufficient.<p>Secondly, almost all data is useless in its raw form. The analysts had to perform ELT jobs on their data in the warehouse to clean, dedupe, aggregate, and project their business rules on that data. These functions often require the database to scan over historical data to produce the new materializations of that data. So even if we could get the data in the warehouse in sub-minute latency, the jobs to transform that data ran every 5 minutes.<p>To be clear, I don't discount the need of telemetry and _some_ data to be actionable in a smaller time frame, I'm just weary of a data warehouse fulfilling that obligation.<p>In any event, I do think this direction is the future (an overwhelming amount of data sources allow change data capture almost immediately after an event occurs), I just don't think it's only architecture that can satisfy most analysts'/data scientists' needs today.<p>I would love to hear the use cases that your customers have that made Materialize a good fit!
Would it be fair to say this is a more OLAP-oriented approach to what KSqlDB (not KSql, but <a href="https://ksqldb.io/" rel="nofollow">https://ksqldb.io/</a>) does?<p>Seems that it's perhaps lacking the richness of how ksqldb uses Kafka Connectors (sinks and sources), but I don't see any reason you couldn't use Materialize in conjunction with ksqldb.<p>Eg:<p>KC-source --> ksql --> materialize --> kafka --> KC-sink<p>Question to Materialize...<p>What connectors (sinks and sources) do you have or plan to develop? Seems like it's mostly Kafka in and out at the moment.<p>Why would I use this over KSqlDB?<p>Can I snapshot and resume from the stream? Or do I need to rehydrate to re-establish state?
I really like the pg protocol (like e.g. Cockroach), it let me use my usual tools. There are a few things I noticed:<p>1. It has a fairly rich support for types - these new-ish SQL engines often lack quite a lot of things, but this seems pretty decent.
2. I don't see any comparisons to KSQL, which seems to be the primary competitor.
3. Read the license. Read it carefully. It has a weird "will become open source in four years" clause, so keep that in mind. It also disallows it being hosted for clients to use (esentially as a DBaaS).
(Linked in the post but) github repo: <a href="https://github.com/MaterializeInc/materialize" rel="nofollow">https://github.com/MaterializeInc/materialize</a>
For anyone interested in the details behind all of this, you should check out Frank's blog:<p><a href="https://github.com/frankmcsherry/blog" rel="nofollow">https://github.com/frankmcsherry/blog</a>
For anyone that might be considering trying something similar with their own Postgres database (PG10+), we recently opensourced this: <a href="https://github.com/supabase/realtime" rel="nofollow">https://github.com/supabase/realtime</a><p>It's an Elixir (Phoenix) server that listens to PostgreSQL's native replication, transforms it into JSON, then blasts it over websockets.<p>I see that Materialize are using Debezium, which will give you a similar result, just with connectors to Kafka etc
I am curious about the physical storage. Is it purely in-memory or is there a disk persistency possible? Is there some kind of data compression applied or what are the memory needs of it? Is it a row or column based data persistence pattern?
The "you may not cluster any server instances of the Licensed Work together for one use" in the license is a fairly tricky clause. Under this clause, how would one run a fault-tolerant instance of Materialize?
How does materialize compare in performance (especially ingress/egress latency) to other OLAP systems like Druid or ClickHouse? Would love to see some benchmarks.
> Blazing fast results<p>I highly doubt this, given that the query engine is interpreted and non-vectorized. Queries are 10x to a 100x slower on a simple query, and 100x to 1000x slower on a query with large aggregations and joins without compilation of vectorization.<p>> Full SQL Exploration<p>Except for window functions it seems. These actually matter to data analysts.
Pretty cool tech although I feel they may have missed the moment as AWS, Azure and GCP are becoming hypercompetitive to solve all things related to data/storage. Azure has been churning out major updates to its services and clearly taking inspiration from companies like Snowflake. AWS I think hesitated to compete with Snowflake as they were running on AWS anyway - win/win for them.<p>Snowflake had incredible timing as they hit the market just before CFO's and non-tech business leaders realized the cost and talent needed to pull off a datalake successfully was more than they'd like. Those that were sick of the management jumped to Snowflake fast and AWS/Azure never really responded until recently.<p>Awesome to see all the innovative takes on solving these extremely technical problems! I love it!
Congrats on the launch, always nice to see new products.<p>This is an interesting mix between the (now obsolete) PipelineDB, TimescaleDB with continuous aggregates, Kafka and other message systems with KSQL/ksqlDB/KarelDB, stream processing engines like Spark, and typical RDBMS like SQL Server with materialized views.<p>The amount of research to support complex and layered queries definitely sets this apart.
Not sure how the featuresets compare but AWS is releasing materialized views for Redshift sometime soon and one of the things it will support is incremental refresh (assuming your view meets some criteria).<p>I'm sure Materialize is better at this since it's purpose-built but if you're on Redshift you can get at least some of the benefits of incremental materialize.
Materialize connects directly to event stream processors (like Kafka) --- how about Pulsar? (Goggling doesn't yield anything useful, Materialize and Pulsar are both name of brands of other things)
I'm wondering how this technology could work for OLAP cubes.<p>An OLAP cube that is automatically & incrementally kept in sync with the changes in the source data sounds promising.<p>Is that a potential use case?
Looking back at the project, knowing what you know now, if you were to start again (but without obtained rust skills), would you go with rust again or pick another toolbox?
See also <a href="https://news.ycombinator.com/item?id=22346915" rel="nofollow">https://news.ycombinator.com/item?id=22346915</a>