What shortcomings of Redis set operations does the in-memory data store address, and how?<p>Unrelated rant: regardless of its merits, "Lambda" Architecture is probably the most annoying overloaded term in use today, second only to "Isomorphic" Javascript. Just because something has a passing resemblance to the functional style doesn't grant license to re-appropriate a well understood term of art.
Out of curiosity why weren't products like Druid <a href="http://druid.io/" rel="nofollow">http://druid.io/</a> or influxdb <a href="https://influxdb.com/" rel="nofollow">https://influxdb.com/</a> or possibly opentsdb taken into consideration ?
"Finally, at query time, we bring together the real-time views from the set database and the batch views from S3 to compute the result"<p>so how in the heck does this work? at query time you decide what file to get our of s3 (hwo do u decide this?), parse it, filter it, and merge with the results from the custom made Redis like real time database?
> in-memory database holds only a limited set of data<p>MemSQL is not just in-memory, but also has column-store (note: I don't know VoltDB). You can think of MemSQL not as "does everything in-memory", but "uses memory at the best".
How do you decide what sets of users you pre aggregate?<p>It seems like without some limits in place you could end up with huge number of sets, especially if you are calculating these based on event properties.
The lambda architecture and the split between the heavy slow processing and the interactive processing reminds me of how a few of our customers are blending Hadoop and Couchbase for similar use cases: <a href="http://www.couchbase.com/fr/ad_platforms" rel="nofollow">http://www.couchbase.com/fr/ad_platforms</a>