I was working for a startup implementing analytics tools. In my opinion, our setup was over-engineered, but I wasn't there at the beginning, so I might be wrong. Also, requirements changed a couple of times, so this could also explain why something that looked necessary for scaling and speed, ended up being this over-engineered mess. This is how it worked: After javascript tracker fired, we got log files, passed them through Kafka, then parsed the log files and performed calculations through Storm (Java). For storage, we used Cassandra. The system also had other parts, but I don't remember why they were there, tbh.<p>My thought process for solving your problem would be the following. First, you need to understand what's good for you and for your company might not be the same. You want the challenge, you want to implement something that could scale and you want to use exotic tools for achieving this. It's interesting and looks good in your CV. Your company might just want the results. You need to decide which is more important.<p>If we prioritize your companies needs over keeping you entertained, I'd follow this thought process:<p>Can't you just use Google Analytics? You can also connect it to BigQuery and do lots of customizations. Maybe time would be better spent learning GA. It's powerful, but most of us cannot use it well.<p>Second question: if for some reason, you don't want to use Google Analytics, can you use another, possibly open-source and/or self-hosted analytics solution? Only because you <i>can</i> design it from scratch, it doesn't mean you should.<p>Third: Alright, you want to implement something from scratch. For this scale, you can probably just log and store events in an SQL database, write the queries, and display it in a dashboard.<p>Then, if you really want to go further, there are many tools that are designed to scale well and perform analytics, "big data". By looking for talks about these tools, you will get a better understanding of how things work. There are various open-source projects you should read more about: Cassandra, Scylla, Spark, Storm, Flink, Hadoop, Kafka, Hadoop, Parquet, just to name a few.