科技回声

6 条评论

lawrjone超过 1 年前

This is an article from Jan 2022 when we were a company of 10, and now are a company of ~80.Worth some observations that:- We're still using Fivetran for the EL stages. Costs are much more significant than they were before and we're looking (for the high volume sources) into options like DataStream as cost savers, but it's not unmanageable.- dbt is still working great, even if we've done a lot of investment having now built a 5 person data team (BI, DA, DE) around it.- Still use Metabase but have some frustrations and are considering other options.- We no longer use Stitch :tada:There's a post that followed this on improvements we made to our setup that may be interesting: <a href="https://incident.io/blog/updated-data-stack" rel="nofollow">https://incident.io/blog/updated-data-stack</a>The OP is still full of relevant, useful information, though (imo, of course).

评论 #38813739 未加载

评论 #38816021 未加载

davedx超过 1 年前

What's the business justification for spending this much effort (money) on data warehousing as a startup?I've not worked at any startups that did data warehousing, the one place I did work at where we were /starting/ to get it setup was like 300+ employees and $100M+/year revenue.

评论 #38814024 未加载

评论 #38815149 未加载

1letterunixname超过 1 年前

Meta does it another way. Instead of one giant data warehouse or various DW silos, build a data platform API stack supporting heterogeneous storage adapters, privacy policies, regional locality policies, and retention policies underneath supporting heterogeneous D*L operations. This sidesteps duplication of and denormalizing data and allows for maximum data discovery, reporting, and reuse. And while GraphQL can't be all things to all people, it's pretty damn good. If needing {MySQL,PostgreSQL,{{other_thing}}}-compatible or REST APIs, then build them similarly.ETL should be minimized (except for external data, which is a bad sign of data owned or managed by a third-party) and replaced with the equivalent of dynamic or materialized "views". Prefer to create hygienic "views" of data against original data rather than mutating and destroying such original data with destructive transformations.Finally, have a deeply-integrated, robust, enterprise-wide, fine-grained ACL system and privacy policy to keep everyone (and system users) from accessing anything without a specific business purpose need and an approval audit record stored via some sort of blockchain-like tech.

评论 #38814692 未加载

evtothedev超过 1 年前

I’d be curious to know if you considered using something like Dagster for orchestrating these runs? Seems like a more natural choice over CircleCI for running what resembles a DAG. (And either way, thanks for sharing this.)

alberth超过 1 年前

Interesting Pricing strategy (for Incident.io)Plan A: $16 (month/user)Plan B: $10,000+ Call UsPlan C: Call UsThose are some of the steepest price cliffs I’ve ever come across.<a href="https://incident.io/pricing#plan-comparison" rel="nofollow">https://incident.io/pricing#plan-comparison</a>

评论 #38813603 未加载

评论 #38813897 未加载

rollulus超过 1 年前

This is likely here now due to <a href="https://news.ycombinator.com/item?id=38797640">https://news.ycombinator.com/item?id=38797640</a> being on top of the fp and referencing it.

6 条评论

lawrjone超过 1 年前

评论 #38813739 未加载

评论 #38816021 未加载

davedx超过 1 年前

评论 #38814024 未加载

评论 #38815149 未加载

1letterunixname超过 1 年前

评论 #38814692 未加载

evtothedev超过 1 年前

alberth超过 1 年前

评论 #38813603 未加载

评论 #38813897 未加载

rollulus超过 1 年前

This is likely here now due to <a href="https://news.ycombinator.com/item?id=38797640">https://news.ycombinator.com/item?id=38797640</a> being on top of the fp and referencing it.

A modern data stack for startups (2022)

6 条评论

A modern data stack for startups (2022)

6 条评论