TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Elasticsearch as a Time Series Data Store

126 pointsby trampiover 9 years ago

8 comments

discordianfishover 9 years ago
That&#x27;s great for unstructured data, like data with high cardinality on the dimensions. But for most real world metrics outside analytics, this isn&#x27;t necessary and a data model like prometheus makes more sense. If I did the math right, even after compression elasticsearch uses 22 bytes per data point (23m points &#x2F; 508 megabyte) where prometheus uses about 2.5-3.5 bytes per data point.<p>Disclosure: Prometheus contributor here
评论 #10562179 未加载
kiyotoover 9 years ago
While I&#x27;m happy to hear about a great success story of a great piece of open source software, Elasticsearch has done a great disservice by making application developers lazy about learning the ins and outs of various analytical&#x2F;transactional&#x2F;storage backend systems.<p>Echoing other commenters, Elasticsearch is hardly the best tool for many kinds of analytics. In fact, it is strictly not a good tool for several use cases. For starters:<p>1. It&#x27;s not good at joining two or more data sources<p>2. It&#x27;s not good at complex analytical processing like window functions (for example to calculating session length based on the deltas of consecutive timestamps partitioned by user_id and ordered by time).<p>Of course, it&#x27;s also good at many things like simple filtering and aggregation against &quot;real-time&quot; data. Being in-memory really helps with performance, and with right tools, it&#x27;s horizontally scalable. Elastic&#x27;s commercial support is also not to be discounted.<p>However, as an old OLAP fart who spent years optimizing KDB+ queries, I am deeply concerned about the willful ignorance of data processing systems that I see among Elasticsearch fans. Just take my word for it and study Postgres (with c_store extension) and other <i>real</i> databases, in-memory or otherwise, open-source or proprietary, so that you won&#x27;t be shooting yourself (or future co-workers) in the foot, trying to shoehorn Elasticsearch and its ilk into suboptimal workloads (To be fair, I see a similar tendency among Splunk zealots).
评论 #10564263 未加载
yeukhonover 9 years ago
Worth mentioning that Elastic.co has a commercial product called Watcher [1] which I think is a really nice way for making an automated alert system. The downside is being a commercial product I can&#x27;t use Watcher and would have to implement one myself.<p>I am still deciding between ES, a relational database and Cassandra for time series data. We use graphite now and are happy with it, but I think having a single database handling logs, events and metrics data would be much more ideal. Having logs already in ES does make ES a better choice.<p>[1]: <a href="https:&#x2F;&#x2F;www.elastic.co&#x2F;guide&#x2F;en&#x2F;watcher&#x2F;current&#x2F;index.html" rel="nofollow">https:&#x2F;&#x2F;www.elastic.co&#x2F;guide&#x2F;en&#x2F;watcher&#x2F;current&#x2F;index.html</a>
评论 #10560986 未加载
评论 #10562145 未加载
sciurusover 9 years ago
The 2.5 release of the time series focused dashboard Grafana added support for Elasticsearch. In a way they&#x27;ve come full-circle, since Grafana started several years ago as a fork of the Elasticsearch dashboard Kibana.<p><a href="http:&#x2F;&#x2F;grafana.org&#x2F;blog&#x2F;2015&#x2F;10&#x2F;28&#x2F;Grafana-2-5-Released.html" rel="nofollow">http:&#x2F;&#x2F;grafana.org&#x2F;blog&#x2F;2015&#x2F;10&#x2F;28&#x2F;Grafana-2-5-Released.html</a>
badlogicover 9 years ago
So many ways you can abuse Lucene :) Many years ago, we used it as a graph data storage as well.
评论 #10564219 未加载
mbajkowskiover 9 years ago
We use elastic search in a very similar manner as described in the article to store high-frequency data for our instance and multi-cloud profiling &#x2F; benchmarking tool:<p><a href="https:&#x2F;&#x2F;profiler.bitfusionlabs.com" rel="nofollow">https:&#x2F;&#x2F;profiler.bitfusionlabs.com</a><p>Since we are collecting data at sub-second granularity and did not want to introduce noise on the profiled instances themselves, whether it be for cpu, mem, or disk, we had a play a few tricks about how to collect data and when to precisely send the data to elastic search, but in general it has been working out very well for us.
pnachbaurover 9 years ago
I tend to think of Time Series data as being several orders of magnitude larger than 23 million data points per week (38 per second) but now I can&#x27;t seem to find a good definition of Time Series data. Anyone have thoughts on the rough threshold between event data and time series data? I think of arrays of hundreds&#x2F;thousands of individual sensors that take 10 measurements a second as &quot;different&quot; than user generated data that is time-ordered.
评论 #10563339 未加载
lafar6502over 9 years ago
??? elasticsearch good for everything.. looks like the computers got cheap and fast enough to do almost anything. Why not put the data in sql database? I suppose this will be much better<p>But nothing seems strange when in order to monitor 1 server you have to run 10 machine cluster with elasticsearch log collector