Hello! As a software engineer I've become very frustrated with the state of the modern data stack and the state of data teams.<p><a href="https://www.linkedin.com/pulse/draining-data-lake-part-1-3-problems-danny-mican-twjle/" rel="nofollow">https://www.linkedin.com/pulse/draining-data-lake-part-1-3-p...</a><p>Many outputs of data teams are aggregated medium cardinality tables usually at the day grain. The modern data stack uses ETL tools to copy operational data stores to datalakes where the raw operational data is then refined, over multiple stages, into these aggregate tables.<p>After working in this stack for 3+ years, I think it has fundamental limitations and the approach lags decade+ behind software engineering best practices.<p>I created Signals Collector to provide a low friction way to collect and aggregate data at the source before loading into a datalake/datawarehouse.<p>The goal of signals collector is to be a verifiable, observable, and maintainable component in a software stack, able to generate aggregate data used for insights.<p>Signals Collector queries and aggregates data at the source (using postgres SQL and mongo aggregations and prometheus API + duckdb).<p>Signals Collector is MIT license. If you're interested in learning more or getting started with signals collector I'd be happy to jump on a call or support through email!<p>Thank you<p>Danny<p>danny@turbolytics.io