Really interesting!
Just to confirm I understood the basics correctly:<p>Write means append, to a timeseries which is keyed by a timestamp, and is identified by some name?
If a write succeeds only partially, different servers have different data. And a read might return any of these versions.
After some time the anti-entropy repair will kick in, and merge the diverging timeseries. Merging means taking the union of all data points.<p>Where do the timestamps come from, the client? So if a client retries a partially successful write, it'll have the same timestamp and will be merged during repair.
Are timestamps within a timeseries monotonically increasing?<p>The hinted handoff sounds like it is motivated by a similar problem that the Kafka in sync replica set tackles. Do you have any views on the pros/cons of your approach via ISR sets? I think Kafka uses ZK for the ISR management which means it wouldn't work with the availability requirements of InfluxDB, but could a modified version work?<p>So overall InfluxDB is sacrificing lots of consistency for availability.
Since the CP part of the system is actually cached, the entire system is really AP?
If not, what parts are not AP? Modification of the CP part, like creation of new timeseries?<p>From a users perspective I could see it being useful to have a historical part of the timeseries that's guaranteed to be stable, and an in-flux part where the system hasn't settled yet. Then one could run run expensive analytics on the historical part, without having to recalculate everything on the next read since the data could have changed since then. You're already hashing your data and building a Merkle Tree, maybe that would make it possible to implement something like that.
What is the difference between this and Cassandra? A more powerful querying language?<p>Cassandra already has Consistency Levels with Replication Strategies. I feel like the only way to get powerful querying out of a system like this would be to have a map reduce layer on top of your db which is what many do to get powerful querying from cassandra.
> Being able to write and query the data is more important than having a strongly consistent view<p>I can imagine some use cases where this is a very reasonable assumption (statsd-style analysis & monitoring systems) but other cases where it's not so great (financial systems).