TechEcho

8 comments

That's great for unstructured data, like data with high cardinality on the dimensions. But for most real world metrics outside analytics, this isn't necessary and a data model like prometheus makes more sense. If I did the math right, even after compression elasticsearch uses 22 bytes per data point (23m points / 508 megabyte) where prometheus uses about 2.5-3.5 bytes per data point.Disclosure: Prometheus contributor here

评论 #10562179 未加载

kiyotoover 9 years ago

While I'm happy to hear about a great success story of a great piece of open source software, Elasticsearch has done a great disservice by making application developers lazy about learning the ins and outs of various analytical/transactional/storage backend systems.Echoing other commenters, Elasticsearch is hardly the best tool for many kinds of analytics. In fact, it is strictly not a good tool for several use cases. For starters:1. It's not good at joining two or more data sources2. It's not good at complex analytical processing like window functions (for example to calculating session length based on the deltas of consecutive timestamps partitioned by user_id and ordered by time).Of course, it's also good at many things like simple filtering and aggregation against "real-time" data. Being in-memory really helps with performance, and with right tools, it's horizontally scalable. Elastic's commercial support is also not to be discounted.However, as an old OLAP fart who spent years optimizing KDB+ queries, I am deeply concerned about the willful ignorance of data processing systems that I see among Elasticsearch fans. Just take my word for it and study Postgres (with c_store extension) and other real databases, in-memory or otherwise, open-source or proprietary, so that you won't be shooting yourself (or future co-workers) in the foot, trying to shoehorn Elasticsearch and its ilk into suboptimal workloads (To be fair, I see a similar tendency among Splunk zealots).

评论 #10564263 未加载

yeukhonover 9 years ago

Worth mentioning that Elastic.co has a commercial product called Watcher [1] which I think is a really nice way for making an automated alert system. The downside is being a commercial product I can't use Watcher and would have to implement one myself.I am still deciding between ES, a relational database and Cassandra for time series data. We use graphite now and are happy with it, but I think having a single database handling logs, events and metrics data would be much more ideal. Having logs already in ES does make ES a better choice.[1]: <a href="https://www.elastic.co/guide/en/watcher/current/index.html" rel="nofollow">https://www.elastic.co/guide/en/watcher/current/index.html</a>

评论 #10560986 未加载

评论 #10562145 未加载

sciurusover 9 years ago

The 2.5 release of the time series focused dashboard Grafana added support for Elasticsearch. In a way they've come full-circle, since Grafana started several years ago as a fork of the Elasticsearch dashboard Kibana.<a href="http://grafana.org/blog/2015/10/28/Grafana-2-5-Released.html" rel="nofollow">http://grafana.org/blog/2015/10/28/Grafana-2-5-Released.html</a>

badlogicover 9 years ago

So many ways you can abuse Lucene :) Many years ago, we used it as a graph data storage as well.

评论 #10564219 未加载

mbajkowskiover 9 years ago

We use elastic search in a very similar manner as described in the article to store high-frequency data for our instance and multi-cloud profiling / benchmarking tool:<a href="https://profiler.bitfusionlabs.com" rel="nofollow">https://profiler.bitfusionlabs.com</a>Since we are collecting data at sub-second granularity and did not want to introduce noise on the profiled instances themselves, whether it be for cpu, mem, or disk, we had a play a few tricks about how to collect data and when to precisely send the data to elastic search, but in general it has been working out very well for us.

pnachbaurover 9 years ago

I tend to think of Time Series data as being several orders of magnitude larger than 23 million data points per week (38 per second) but now I can't seem to find a good definition of Time Series data. Anyone have thoughts on the rough threshold between event data and time series data? I think of arrays of hundreds/thousands of individual sensors that take 10 measurements a second as "different" than user generated data that is time-ordered.

评论 #10563339 未加载

lafar6502over 9 years ago

??? elasticsearch good for everything.. looks like the computers got cheap and fast enough to do almost anything. Why not put the data in sql database? I suppose this will be much betterBut nothing seems strange when in order to monitor 1 server you have to run 10 machine cluster with elasticsearch log collector

8 comments

discordianfishover 9 years ago

评论 #10562179 未加载

kiyotoover 9 years ago

评论 #10564263 未加载

yeukhonover 9 years ago

评论 #10560986 未加载

评论 #10562145 未加载

sciurusover 9 years ago

badlogicover 9 years ago

So many ways you can abuse Lucene :) Many years ago, we used it as a graph data storage as well.

Elasticsearch as a Time Series Data Store

8 comments

Elasticsearch as a Time Series Data Store

8 comments