TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

InfluxDB vs. Cassandra for timeseries data

61 点作者 rar_ram超过 8 年前

7 条评论

paulasmuth超过 8 年前
The linked article is an obviously bullshit benchmark that makes influxdb look good and cassandra look bad (by, surprise, the influxdb folks).<p>I&#x27;m far from a cassandra fanboy, but this really is just dishonest marketing. Not sure if that will work if your product is open source and the target audience are developers.<p>Some thoughts:<p>- The reason why cassandra uses so much more space to store the same data is that they&#x27;ve set up the cassandra table schema in such a way that cassandra needs to write the series ID string for each sample (while influxdb only needs to write the values). You easily get a 10-100x blowup just from that. There is no superior &quot;compression&quot; technology here but just an apples-to-oranges comparison.<p>- Then, comparing the queries is even worse, because they are testing a kind of query (aggregation) that cassandra does not support. To still get a benchmark where they&#x27;re much faster, they just wrote some code that retrieves all the data from cassandra into a process and then executes the query within their own process. If anything, they&#x27;re benchmarking one query tool they&#x27;ve written against another one of their own tools.<p>- Also, if I didn&#x27;t miss anythin, the article doesn&#x27;t say on what kind of cluster they actually ran this on or even if they ran both tests on the same hardware. There definitely are cassandra clusters handling more than 100k writes&#x2F;sec in production right now. So I guess they picked a peculiar configuration in which they outperform cassandra in terms of write ops (given a good distribution of keys, cassandra is more or less linearly scalable in this dimension)<p>- A better target to benchmark against would probably be <a href="http:&#x2F;&#x2F;opentsdb.net&#x2F;" rel="nofollow">http:&#x2F;&#x2F;opentsdb.net&#x2F;</a> or <a href="http:&#x2F;&#x2F;prometheus.io&#x2F;" rel="nofollow">http:&#x2F;&#x2F;prometheus.io&#x2F;</a> - both seem to have somewhat similar semantics to InfluxDB (which cassandra and elasticsearch do not)<p>DISC: I also work on a distributed database product (<a href="https:&#x2F;&#x2F;eventql.io" rel="nofollow">https:&#x2F;&#x2F;eventql.io</a>) but it&#x27;s neither a direct competitor to Cassandra nor InfluxDB nor any of the other products I&#x27;ve mentioned. I hope the comment doesn&#x27;t come across as too harsh. The article raised some very big (and harsh) claims so I think it&#x27;s fair to respond in tone.
评论 #12453256 未加载
评论 #12452383 未加载
评论 #12451994 未加载
daenney超过 8 年前
The conclusion isn&#x27;t entirely surprising, &quot;we from X say that engine X is better than engine Y&quot; but there are many companies that have monitoring stacks built on top of Cassandra, like SignalFX. They have a presentation or two on the topic too that might be interesting: <a href="http:&#x2F;&#x2F;www.slideshare.net&#x2F;planetcassandra&#x2F;signalfx-making-cassandra-perform-as-a-time-series-database" rel="nofollow">http:&#x2F;&#x2F;www.slideshare.net&#x2F;planetcassandra&#x2F;signalfx-making-ca...</a><p>Ultimately this benchmark will be heavily influenced by the code written to &quot;emulate&quot; the InfluxDB parts on top of Cassandra and how much of that code puts Cassandra at a disadvantage. I&#x27;d like to hear from some people that have built such solutions on top of Cassandra what they think about the benchmark and see how that benchmark would evolve.
soundoflight超过 8 年前
From using InfluxDB (up to v0.10 I think it was), it&#x27;s a great database but performance REALLY depends on the cardinality of your data.<p>I can&#x27;t stress it enough, calculate your cardinality before switching over to it. If your cardinality looks good, InfluxDB is a perfect, logical choice. I really enjoyed it and it is dirt simple to figure out. We had a junior dev just out of college with little experience set it up and get a high level of proficiency in a matter of hours.<p>Edit: I should point out, I was doing about 10 million records on my db (hosted on a Mac Mini in development!) a day with a 2 week sliding window. I was pushing the data from InfluxDB into custom D3 visualizations. I would cache certain queries in Redis, so I wasn&#x27;t always hitting InfluxDB with each read request.
评论 #12453380 未加载
tychuz超过 8 年前
Just looking at the domain is easy to guess which one will win...
klucar超过 8 年前
Has anyone successfully compiled their benchmark code? <a href="https:&#x2F;&#x2F;github.com&#x2F;influxdata&#x2F;influxdb-comparisons" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;influxdata&#x2F;influxdb-comparisons</a><p>I added code to the data generator to work with Timely (<a href="https:&#x2F;&#x2F;nationalsecurityagency.github.io&#x2F;timely&#x2F;" rel="nofollow">https:&#x2F;&#x2F;nationalsecurityagency.github.io&#x2F;timely&#x2F;</a>) but can&#x27;t get it compiled.<p>Also, it seemed that ingest and query were separate stages. Queries should be run while ingest is running to get real-world performance, but I understand it is more difficult to test this way.
dz0ny超过 8 年前
It would be interesting to compare memory requirements, I chose Influxdb because it had 10 times lower memory usage. The dataset was small (couple of million datapoints)... but stil
评论 #12451781 未加载
评论 #12452279 未加载
LogicX超过 8 年前
Not sure why this blog post from July made it to the front page now.<p>Though 1.0 GA is being released today.