FWIW jut.io shut down the day after this was posted. <a href="https://twitter.com/PurpleQuark/status/661274501728964608" rel="nofollow">https://twitter.com/PurpleQuark/status/661274501728964608</a>
I'm working on something similar. So far I like Apache NiFi for ingestion and Apache Flink for processing. Storage choice(s) are plenty and IMHO determined by the use-case and available expertise.
Let's get this out of the way - I love it when companies are open and transparent about their architecture. Sharing intimate details like this is fantastic.<p>Where I'm struggling is that there are a number of questionable choices here with little justification. For example, why a HTTP front-end? This is fine for webhooks but I'm not going to let my website's backend open an HTTP connection for every event I want to send out. The decision to store the data in Elasticsearch and Cassandra is equally dubious. In my experience Elasticsearch has been a maintenance nightmare and has not been a perform any and robust reporting solution at scale.
I saw these guys at Velocity in NY this year. Pretty impressive product. I felt like the query language they built was easier to work with than setting up queries and filters in elasticsearch's api.<p>Really interesting to hear about the innards.<p>Thanks for the post.
Do you support only transitive aggregation operations? If so why not push the entire aggregation to elasticsearch/cadsandra?<p>How do you plan to scale cpu wise? ES and streaming engines (dont know cassandra) are cpu hogs (compared to map reduce).
I heard at devopsdays Tel Aviv that bigpanda decided to provide different sla to paying and non paying customers to balance the costs.