Hello Hacker News! We're a team of YC founders (Meldium W13, Draft S11, TapEngage S11) launching something new (<a href="https://www.getcensus.com" rel="nofollow">https://www.getcensus.com</a>). How many times has your business team asked you to generate yet another CSV file, write a ”quick report” in SQL, or send some custom data to a terrible API (looking at you Marketo)?<p>We’ve built a product that connects directly to your data warehouse and syncs into apps like Salesforce, Customer.io and even Google Sheets. In fact, your business teams won’t even need to rely on engineering to manage all these pipelines.<p>The tech stack for analyzing customer data in 2020 looks pretty great. You can load almost any data into an auto-scaling data warehouse (Snowflake, BigQuery) with easy point & click tools like Fivetran. You can build SQL models with dbt and create visual reports in Metabase. But you can’t easily push insights back into the marketing/sales/support apps.<p>You can’t solve this with direct app integrations or “event routers” like Zapier and you definitely shouldn’t over-engineer a solution with Spark/Kafka/Airflow. We designed Census to make your data warehouse a single source of truth for modeling and transformation before publishing data back into your SaaS tools quickly and reliably.<p>We’re proud of what we’ve built so far and there’s a lot more work to deliver on our dream of saving us all from generating & uploading yet another CSV file so we can spend more time actually building our products (or reading HN). You can check it out at <a href="https://www.getcensus.com" rel="nofollow">https://www.getcensus.com</a>.<p>Since this is HN, we’d love to hear everyone’s war stories on building internal ETL solutions!
Brad from Census (<a href="https://www.getcensus.com" rel="nofollow">https://www.getcensus.com</a>) here - I wanted to add some technical detail for the HN crowd.<p>On its face, a Census workflow is a simple "program" that our users author in a point-and-click manner – read some data from System A and broadcast it to Systems B, C, and D. But to "compile" that program, Census has to determine:<p><pre><code> - Which APIs to use (bulk, streaming, etc)
- The schema of each destination (which can change out from underneath us at run time!)
- The semantics of reads and writes from each system (atomicity, isolation, how roll back if a write fails)
- How to map data with high-fidelity across strongly-, weakly-, and dynamically-typed data stores
</code></pre>
That's just compilation – then we need to execute that compiled plan and move massive amounts of data with low latency and high throughput, all while handling byzantine failures in source and destination systems and automatically rolling back, recovering, or helping users "debug" their workflows when things go wrong.<p>There's a lot of depth to this (and we haven't "solved" it by any means) - happy to answer questions here or at brad@getcensus.com if you have them!
Excited to follow your progress! I view this problem as the one of the biggest gaps in today's "Cloud Data Ecosystem". Tools like Stitch and Fivetran make it super easy to extract data from source systems; next-gen cloud data platforms like Snowflake make storing, transforming, and querying that data a breeze (especially with the help of tools like dbt and dataform); and there are a ton of powerful and easy to use BI tools for visualizing and digesting that data. But the minute you need to send that data to other systems, it's back to painful, failure-prone, and mind-numbing scripts.
Nice, congrats on launching! I'm excited to try it out.<p>Curious - what if there's a transformation that happens with data from the data warehouse but can't be performed with SQL (such as a Python script)? Is there a way to send that data back into the integrations that you support? Or would be it best to push back into the data warehouse and use Census from there?<p>Aka, can you only transform with Census using SQL? Or other languages as well?