Druid is quickly becoming the leading open source solution for building highly scalable analytics. We evaluated it for getstream.io. Unfortunately the setup and maintenance is still very labour intensive. For startups that's a concern. Many larger companies we spoke to were extremely happy about running Druid in production though.
The realtime ingestion is interesting especially if I can still batch import. When processing machine data, I've found that a quantity of sources come in chunks (logfiles written out every 24 hours for exmaple) but the eventual aim is to migrate to realtime (i.e.: a data point every n seconds/minutes/etc. where you instantly consume that data point) streaming.<p>If this transition is easy without reworking infrastructure, the solution is far more attractive.
Every open source column database I've seen is very poor: text, no decent array oriented ability (give me the prevoius row), slow, json output, etc. When will somebody get it right?
A friend of mine who interned with me at eBay used Druid and Angular to great success to build a tool for analysts to look at trends in our data. Druid is some seriously cool stuff.
Our 2-man team set up Druid........ i took 5+ months and was excruciating to configure and get running smoothly (things were slightly more complicated because we decided to use docker). It also took ~30 servers to make a truly fault-tolerant setup.<p>With that said, it works very well, but it definitely came at the cost of a good dose of sanity.
Has anyone done a meaningful private benchmark comparison with <a href="http://www.scylladb.com/" rel="nofollow">http://www.scylladb.com/</a> ? I didn't find one online.
Yeah, more database solutions that's what we need.<p><a href="https://xkcd.com/927/" rel="nofollow">https://xkcd.com/927/</a>