Hello everyone,<p>I'm currently participating in the "PLAsTiCC Astronomical Classification" Kaggle competition. The dataset is rather large (~5M rows) so I decided to write a simple tool to compute aggregate features online. I got really inspired and tidied things up over the weekend. Maybe this can be of interest to some of you.<p>PS: I'm aware that there are other similar projects out there such as Spark Streaming, but they all feel too bloated and difficult to grok.