科技回声

3 条评论

Great comparison, and I hope the state of the world keeps getting better for TSDBs so we don't need to build our own at some point - but I disagree re:------------------------------------- Performing queries across billions of metrics looking for labels that only match a few of them (a common scenario with time series data at scale) is really slow in Cassandra. This is because of the way it stores data in columns. This extends to any columnar database including Google's BigQuery which all have a natural disadvantage with time series data. -------------------------------------There's nothing inherently limiting in columnar databases that makes it slow to match only a few elements that match only a few out of billions or trillions of records.... but a classic columnar store might not be as efficient for storage, or might take 5-10x the nodes to return with the same speed with that kind of filtering, depending on storage and clustering mechanisms used.

评论 #12380326 未加载

user5994461超过 8 年前

> I'm only interested in time series databases for use by developers and operations to store and retrieve data that pertains to the health and performance of the services they build and operate. Everything in this blog will judge the entries based on their suitability for that task.That is a very particular problem, in which the data storage is a minimal [yet important] aspect of the full system.You're probably going the wrong route if you're trying to redesign your own and you'll only realize that way too late when you'll have to design your own metrics collection, own graphing, own alerting, own...The standard proven open-source stack:collectd/statd (metrics collection + whipser/graphite (storage) + grafana (cute graphs and dashboards).The latest fad is to replace graphite with prometheus (which is better in some aspects but has it own fault).Both these open source tools will satisfy your purpose.HARDCORE LIMITATIONS: Both these open source tools are entirely single node. There is no form of sharding nor high availability nor horizontal scaling.(Rules of thumb: Should be fine up to 100 hosts and applications. Then get ready to throw big hardware and tune retention aggressively.)---Some quick maths:8 bytes per metrics * every 5 second = 967 kB per metrics over the week967 kB per metric * 100 metrics per host * 100 hosts = ~10 GB per week for high precisionAny of the parameter can spiral by tenfold (depending on the setup, retention, hosts, metrics per app...). That means going straight into TB range and scaling issues where one node is simply out of the question.---It's pretty clear that the open source solutions don't scale and are hard to maintain... so what's next when we outgrow them?Switch to the latest generation of monitoring tools. The two best solutions are datadog and signalfx. They both accept custom metrics from your app.And... oh wait I just noticed that dataloop.io is a new SaaS solution trying to compete with them. Oops :D

评论 #12373571 未加载

评论 #12373572 未加载

Dowwie超过 8 年前

Is there a publication date for this? That's a really important attribute to include with comparisons like this.

评论 #12374452 未加载

评论 #12374453 未加载

3 条评论

avifreedman超过 8 年前

评论 #12380326 未加载

user5994461超过 8 年前

评论 #12373571 未加载

评论 #12373572 未加载

Dowwie超过 8 年前

Is there a publication date for this? That's a really important attribute to include with comparisons like this.

评论 #12374452 未加载

评论 #12374453 未加载

Comparison of opensource time series db's

3 条评论

Comparison of opensource time series db's

3 条评论