TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A modern data stack for startups (2022)

58 点作者 olestr超过 1 年前

6 条评论

lawrjone超过 1 年前
This is an article from Jan 2022 when we were a company of 10, and now are a company of ~80.<p>Worth some observations that:<p>- We&#x27;re still using Fivetran for the EL stages. Costs are much more significant than they were before and we&#x27;re looking (for the high volume sources) into options like DataStream as cost savers, but it&#x27;s not unmanageable.<p>- dbt is still working great, even if we&#x27;ve done a lot of investment having now built a 5 person data team (BI, DA, DE) around it.<p>- Still use Metabase but have some frustrations and are considering other options.<p>- We no longer use Stitch :tada:<p>There&#x27;s a post that followed this on improvements we made to our setup that may be interesting: <a href="https:&#x2F;&#x2F;incident.io&#x2F;blog&#x2F;updated-data-stack" rel="nofollow">https:&#x2F;&#x2F;incident.io&#x2F;blog&#x2F;updated-data-stack</a><p>The OP is still full of relevant, useful information, though (imo, of course).
评论 #38813739 未加载
评论 #38816021 未加载
davedx超过 1 年前
What&#x27;s the business justification for spending this much effort (money) on data warehousing as a startup?<p>I&#x27;ve not worked at any startups that did data warehousing, the one place I did work at where we were &#x2F;starting&#x2F; to get it setup was like 300+ employees and $100M+&#x2F;year revenue.
评论 #38814024 未加载
评论 #38815149 未加载
1letterunixname超过 1 年前
Meta does it another way. Instead of one giant data warehouse or various DW silos, build a data platform API stack supporting heterogeneous storage adapters, privacy policies, regional locality policies, and retention policies underneath supporting heterogeneous D*L operations. This sidesteps duplication of and denormalizing data and allows for maximum data discovery, reporting, and reuse. And while GraphQL can&#x27;t be all things to all people, it&#x27;s pretty damn good. If needing {MySQL,PostgreSQL,{{other_thing}}}-compatible or REST APIs, then build them similarly.<p>ETL should be minimized (except for external data, which is a bad sign of data owned or managed by a third-party) and replaced with the equivalent of dynamic or materialized &quot;views&quot;. Prefer to create hygienic &quot;views&quot; of data against original data rather than mutating and destroying such original data with destructive transformations.<p>Finally, have a deeply-integrated, robust, enterprise-wide, fine-grained ACL system and privacy policy to keep everyone (and system users) from accessing anything without a specific business purpose need and an approval audit record stored via some sort of blockchain-like tech.
评论 #38814692 未加载
evtothedev超过 1 年前
I’d be curious to know if you considered using something like Dagster for orchestrating these runs? Seems like a more natural choice over CircleCI for running what resembles a DAG. (And either way, thanks for sharing this.)
alberth超过 1 年前
Interesting Pricing strategy (for Incident.io)<p>Plan A: $16 (month&#x2F;user)<p>Plan B: $10,000+ Call Us<p>Plan C: Call Us<p>Those are some of the steepest price cliffs I’ve ever come across.<p><a href="https:&#x2F;&#x2F;incident.io&#x2F;pricing#plan-comparison" rel="nofollow">https:&#x2F;&#x2F;incident.io&#x2F;pricing#plan-comparison</a>
评论 #38813603 未加载
评论 #38813897 未加载
rollulus超过 1 年前
This is likely here now due to <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38797640">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38797640</a> being on top of the fp and referencing it.