Cool! If I get a chance, I will take a look and see how easy it would be to plug this as a storage/analytics option into the Snowplow Kinesis flow (<a href="https://github.com/snowplow/snowplow/tree/master/3-enrich/scala-kinesis-enrich" rel="nofollow">https://github.com/snowplow/snowplow/tree/master/3-enrich/sc...</a>).
This is good example of how to answer 'big data' questions without big expensive distributed systems via the magic of lots of RAM and probabilistic data structures.
if you are interested, there is also a separate blog post, <a href="http://www.codecademy.com/blog/143-eventhub-open-sourced-funnel-analysis-cohort-analysis-and-a-b-testing-tool" rel="nofollow">http://www.codecademy.com/blog/143-eventhub-open-sourced-fun...</a>, in which we talk about some high level architecture consideration
Are you using this in Production now? I took a stab at something similar in Ruby (<a href="https://github.com/doomspork/Orwell" rel="nofollow">https://github.com/doomspork/Orwell</a>) awhile ago, seeing this makes me want to dust it off and add new features. Thanks for sharing!