My previous job was at an advertising firm, and we used HyperLogLogs for almost all of our real-time analytics infrastructure. They are incredibly space and time efficient. Each "counter" fits into about a single page of memory, and can count into the trillions with <2% error.<p>We developed an extremely high performance server around it (hlld): <a href="https://github.com/armon/hlld" rel="nofollow">https://github.com/armon/hlld</a>.<p>We were typically hitting with with tens of thousands of requests per second across about 50K counters. Although it was benchmarked to >1MM ops a second.<p>Similarly, we also make bloomd, which is an equivalent for using bloom filters, which provide a more set-like abstraction: <a href="https://github.com/armon/bloomd" rel="nofollow">https://github.com/armon/bloomd</a>