TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Tuna – a simple streaming ETL for machine learning features

1 pointsby Lemaxoxoover 6 years ago

1 comment

Lemaxoxoover 6 years ago
Hello everyone,<p>I&#x27;m currently participating in the &quot;PLAsTiCC Astronomical Classification&quot; Kaggle competition. The dataset is rather large (~5M rows) so I decided to write a simple tool to compute aggregate features online. I got really inspired and tidied things up over the weekend. Maybe this can be of interest to some of you.<p>PS: I&#x27;m aware that there are other similar projects out there such as Spark Streaming, but they all feel too bloated and difficult to grok.