TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Comparison of opensource time series db's

26 点作者 dataloopio超过 8 年前

3 条评论

avifreedman超过 8 年前
Great comparison, and I hope the state of the world keeps getting better for TSDBs so we don&#x27;t need to build our own at some point - but I disagree re:<p>------------------------------------- Performing queries across billions of metrics looking for labels that only match a few of them (a common scenario with time series data at scale) is really slow in Cassandra. This is because of the way it stores data in columns. This extends to any columnar database including Google&#x27;s BigQuery which all have a natural disadvantage with time series data. -------------------------------------<p>There&#x27;s nothing inherently limiting in columnar databases that makes it slow to match only a few elements that match only a few out of billions or trillions of records.<p>... but a classic columnar store might not be as efficient for storage, or might take 5-10x the nodes to return with the same speed with that kind of filtering, depending on storage and clustering mechanisms used.
评论 #12380326 未加载
user5994461超过 8 年前
&gt; I&#x27;m only interested in time series databases for use by developers and operations to store and retrieve data that pertains to the health and performance of the services they build and operate. Everything in this blog will judge the entries based on their suitability for that task.<p>That is a very particular problem, in which the data storage is a minimal [yet important] aspect of the full system.<p>You&#x27;re probably going the wrong route if you&#x27;re trying to redesign your own and you&#x27;ll only realize that way too late when you&#x27;ll have to design your own metrics collection, own graphing, own alerting, own...<p>The standard proven open-source stack:<p>collectd&#x2F;statd (metrics collection + whipser&#x2F;graphite (storage) + grafana (cute graphs and dashboards).<p>The latest fad is to replace graphite with prometheus (which is better in some aspects but has it own fault).<p>Both these open source tools will satisfy your purpose.<p>HARDCORE LIMITATIONS: Both these open source tools are entirely single node. There is no form of sharding nor high availability nor horizontal scaling.<p>(Rules of thumb: Should be fine up to 100 hosts and applications. Then get ready to throw big hardware and tune retention aggressively.)<p>---<p>Some quick maths:<p>8 bytes per metrics * every 5 second = 967 kB per metrics over the week<p>967 kB per metric * 100 metrics per host * 100 hosts = ~10 GB per week for high precision<p>Any of the parameter can spiral by tenfold (depending on the setup, retention, hosts, metrics per app...). That means going straight into TB range and scaling issues where one node is simply out of the question.<p>---<p>It&#x27;s pretty clear that the open source solutions don&#x27;t scale and are hard to maintain... so what&#x27;s next when we outgrow them?<p>Switch to the latest generation of monitoring tools. The two best solutions are datadog and signalfx. They both accept custom metrics from your app.<p>And... oh wait I just noticed that dataloop.io is a new SaaS solution trying to compete with them. Oops :D
评论 #12373571 未加载
评论 #12373572 未加载
Dowwie超过 8 年前
Is there a publication date for this? That&#x27;s a really important attribute to include with comparisons like this.
评论 #12374452 未加载
评论 #12374453 未加载