TechEcho

7 comments

ah-almost 10 years ago

Really interesting! Just to confirm I understood the basics correctly:Write means append, to a timeseries which is keyed by a timestamp, and is identified by some name? If a write succeeds only partially, different servers have different data. And a read might return any of these versions. After some time the anti-entropy repair will kick in, and merge the diverging timeseries. Merging means taking the union of all data points.Where do the timestamps come from, the client? So if a client retries a partially successful write, it'll have the same timestamp and will be merged during repair. Are timestamps within a timeseries monotonically increasing?The hinted handoff sounds like it is motivated by a similar problem that the Kafka in sync replica set tackles. Do you have any views on the pros/cons of your approach via ISR sets? I think Kafka uses ZK for the ISR management which means it wouldn't work with the availability requirements of InfluxDB, but could a modified version work?So overall InfluxDB is sacrificing lots of consistency for availability. Since the CP part of the system is actually cached, the entire system is really AP? If not, what parts are not AP? Modification of the CP part, like creation of new timeseries?From a users perspective I could see it being useful to have a historical part of the timeseries that's guaranteed to be stable, and an in-flux part where the system hasn't settled yet. Then one could run run expensive analytics on the historical part, without having to recalculate everything on the next read since the data could have changed since then. You're already hashing your data and building a Merkle Tree, maybe that would make it possible to implement something like that.

评论 #9657076 未加载

penproggalmost 10 years ago

What is the difference between this and Cassandra? A more powerful querying language?Cassandra already has Consistency Levels with Replication Strategies. I feel like the only way to get powerful querying out of a system like this would be to have a map reduce layer on top of your db which is what many do to get powerful querying from cassandra.

评论 #9657026 未加载

pauldixalmost 10 years ago

InfluxDB CEO and post author here. I'd love to hear feedback and answer any questions.

评论 #9657029 未加载

评论 #9656601 未加载

评论 #9658366 未加载

评论 #9656371 未加载

评论 #9657599 未加载

评论 #9656819 未加载

评论 #9656728 未加载

评论 #9656630 未加载

lucian1900almost 10 years ago

Looks quite nice and straightforward, but this is very clearly an AP system. Most such systems use a CP component for cluster management.

评论 #9658571 未加载

seaworthy-tonyaalmost 10 years ago

> Being able to write and query the data is more important than having a strongly consistent viewI can imagine some use cases where this is a very reasonable assumption (statsd-style analysis & monitoring systems) but other cases where it's not so great (financial systems).

评论 #9657086 未加载

chaotic-goodalmost 10 years ago

What is throughput of the system per node?

评论 #9658580 未加载

hendzenalmost 10 years ago

How is this different than hadoop?

评论 #9656826 未加载

评论 #9656831 未加载

7 comments

ah-almost 10 years ago

评论 #9657076 未加载

penproggalmost 10 years ago

评论 #9657026 未加载

pauldixalmost 10 years ago

InfluxDB CEO and post author here. I'd love to hear feedback and answer any questions.

评论 #9657029 未加载

评论 #9656601 未加载

评论 #9658366 未加载

评论 #9656371 未加载

评论 #9657599 未加载

评论 #9656819 未加载

评论 #9656728 未加载

评论 #9656630 未加载

lucian1900almost 10 years ago

Looks quite nice and straightforward, but this is very clearly an AP system. Most such systems use a CP component for cluster management.

InfluxDB Clustering Design – Neither CP or AP

7 comments

InfluxDB Clustering Design – Neither CP or AP

7 comments