That's great for unstructured data, like data with high cardinality on the dimensions. But for most real world metrics outside analytics, this isn't necessary and a data model like prometheus makes more sense.
If I did the math right, even after compression elasticsearch uses 22 bytes per data point (23m points / 508 megabyte) where prometheus uses about 2.5-3.5 bytes per data point.<p>Disclosure: Prometheus contributor here
While I'm happy to hear about a great success story of a great piece of open source software, Elasticsearch has done a great disservice by making application developers lazy about learning the ins and outs of various analytical/transactional/storage backend systems.<p>Echoing other commenters, Elasticsearch is hardly the best tool for many kinds of analytics. In fact, it is strictly not a good tool for several use cases. For starters:<p>1. It's not good at joining two or more data sources<p>2. It's not good at complex analytical processing like window functions (for example to calculating session length based on the deltas of consecutive timestamps partitioned by user_id and ordered by time).<p>Of course, it's also good at many things like simple filtering and aggregation against "real-time" data. Being in-memory really helps with performance, and with right tools, it's horizontally scalable. Elastic's commercial support is also not to be discounted.<p>However, as an old OLAP fart who spent years optimizing KDB+ queries, I am deeply concerned about the willful ignorance of data processing systems that I see among Elasticsearch fans. Just take my word for it and study Postgres (with c_store extension) and other <i>real</i> databases, in-memory or otherwise, open-source or proprietary, so that you won't be shooting yourself (or future co-workers) in the foot, trying to shoehorn Elasticsearch and its ilk into suboptimal workloads (To be fair, I see a similar tendency among Splunk zealots).
Worth mentioning that Elastic.co has a commercial product called Watcher [1] which I think is a really nice way for making an automated alert system. The downside is being a commercial product I can't use Watcher and would have to implement one myself.<p>I am still deciding between ES, a relational database and Cassandra for time series data. We use graphite now and are happy with it, but I think having a single database handling logs, events and metrics data would be much more ideal. Having logs already in ES does make ES a better choice.<p>[1]: <a href="https://www.elastic.co/guide/en/watcher/current/index.html" rel="nofollow">https://www.elastic.co/guide/en/watcher/current/index.html</a>
The 2.5 release of the time series focused dashboard Grafana added support for Elasticsearch. In a way they've come full-circle, since Grafana started several years ago as a fork of the Elasticsearch dashboard Kibana.<p><a href="http://grafana.org/blog/2015/10/28/Grafana-2-5-Released.html" rel="nofollow">http://grafana.org/blog/2015/10/28/Grafana-2-5-Released.html</a>
We use elastic search in a very similar manner as described in the article to store high-frequency data for our instance and multi-cloud profiling / benchmarking tool:<p><a href="https://profiler.bitfusionlabs.com" rel="nofollow">https://profiler.bitfusionlabs.com</a><p>Since we are collecting data at sub-second granularity and did not want to introduce noise on the profiled instances themselves, whether it be for cpu, mem, or disk, we had a play a few tricks about how to collect data and when to precisely send the data to elastic search, but in general it has been working out very well for us.
I tend to think of Time Series data as being several orders of magnitude larger than 23 million data points per week (38 per second) but now I can't seem to find a good definition of Time Series data. Anyone have thoughts on the rough threshold between event data and time series data? I think of arrays of hundreds/thousands of individual sensors that take 10 measurements a second as "different" than user generated data that is time-ordered.
??? elasticsearch good for everything.. looks like the computers got cheap and fast enough to do almost anything.
Why not put the data in sql database? I suppose this will be much better<p>But nothing seems strange when in order to monitor 1 server you have to run 10 machine cluster with elasticsearch log collector