TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Building a data team at a mid-stage startup

607 pointsby squarecogalmost 4 years ago

28 comments

zippy5almost 4 years ago
This was wonderfully written and if your gonna start a data team, this is how you do it. But I can see that I’m the only one who thought it was crazy to start a data team in the first place.<p>This company makes 10M and spends 3M on the team and infrastructure to make data a core competency?<p>A vast majority of wins discussed were lowly differentiated web &#x2F; mobile &#x2F; supply chain analytics which they could have gotten and setup with 3rd party software for an order of magnitude cheaper.<p>I can only imagine what this hypothetical startup could have learned if they spent that money actually talking to customers, and running more experiments.<p>I’ve heard people talk about data as the new oil but for most companies it’s a lot closer uranium. Hard to find people who can to handle &#x2F; process it correctly, nontrivial security&#x2F;liabilities if PII is involved, expensive to store and a generally underwhelming return on effort relative to the anticipated utility.<p>My take away was that startups benefit tremendously from a data advisor role to get the data competency, as well as the educational and cultural benefits, but realistically the data infrastructure and analytics at that scale should have been bought not built. Obviously there are a couple of exceptions such regulatory reasons like hippa compliance for which building in-house can be the right choice if no vendor fits your use case.
评论 #27781516 未加载
评论 #27781611 未加载
评论 #27782727 未加载
评论 #27797840 未加载
评论 #27781509 未加载
czepalmost 4 years ago
This is so eerily familiar I swear I&#x27;ve had many of these exact conversations word for word. The only way this doesn&#x27;t turn into a complete nightmare of a cluster is if the exec team &quot;gets it&quot;. If so, you just might stand a chance at building a data team that gels with the rest of the org.<p>But if the exec team simply hired you for window-dressing, expect to be treated like a scapegoat and a punching bag. Any mistakes will be your fault. Any wins will be to the credit of the business. The Director of Product will ask to &quot;embed&quot; dedicated DS headcount and you won&#x27;t have any real power to shape the roadmap. If the exec team doesn&#x27;t give you equal footingf with Product (or Marketing, Finance, and Eng for that matter) then this will rapidly become a soul-sucking job. However, if E-team does give you the authority to call Product&#x27;s bullshit, and tell Finance to stuff it, and not take direction from Eng leads, then you actually might be able to accomplish something really cool.
评论 #27779043 未加载
评论 #27784386 未加载
评论 #27779955 未加载
评论 #27782249 未加载
plank_timealmost 4 years ago
This is probably the singly best written and most realistic article I’ve read on HN ever and I’ve been on HN for a long long time. It’s so realistic I wonder if the author took it from his diary or something. Everything about it is supersaturated with authenticity and teaches better than any other article I’ve read. Kudos to the author, and I would love to see this style of article take off.
评论 #27779414 未加载
评论 #27787007 未加载
plaidfujialmost 4 years ago
So many gems in this article…<p>&gt; You notice a a lot of the code starts with very complicated preprocessing steps, where data has to be fetched from many different systems. There appears to be several scripts that have to be run manually in the right order to run some of these things.<p>&gt; “We need to focus on delivering business value as quickly as possible”, you say, but you add that “we might get back to the machine learning stuff soon… let&#x27;s see”.<p>So so relatable. But the key insight is a really really key insight.<p>&gt; What I think makes most sense to push for is a centralization the reporting structure, but keeping the work management decentralized. Why? Primarily because it creates a much tighter feedback loop between data and decisions. If every question has to go through a central bottleneck, transaction costs will be high. On the other hand, you don&#x27;t want to decentralize the management. Strong data people want to report into a manager who understands data, not into a business person.<p>I have the same role at a non-software company, and to me this is nothing short of a complete reimagining of IT. It’s not just, “make sure everyone’s computer works and help them install software,” it’s, “build a model of the business, determine what information flows and metrics are crucial to success, and build an IT and analysis infrastructure around that model.” The CIO will soon be better thought of as the Chief Optimization Officer.
GlennSalmost 4 years ago
I liked this article, but I have two questions:<p>1. Is it definitely a good idea to build a separate data team, rather than embedding people with analytics knowledge in feature teams?<p>Is it possible to do the latter, but still have end up with a well-curated source-of-truth for your data?<p>2. Is A&#x2F;B testing and driving your business by metrics really a good idea?<p>My (uninformed) impression is that data-driven is responsible for rather a lot of rot:<p>- Extremely irritating websites.<p>- Businesses ignoring important things because they can&#x27;t measure them. (Financialisation, hand-in-hand with the MBA types the author decries.)
评论 #27782234 未加载
评论 #27790778 未加载
评论 #27782149 未加载
评论 #27792592 未加载
IMTDbalmost 4 years ago
What would be the name of the position&#x2F;profile of someone in charge of building the data warehousing architecture&#x2F;ETL pipelines?<p>I my view, they need make sure the warehouse model is a correct representation of the business and that it can be leveraged to answer basic or not-so-basic questions using SQL. They also need to promote it&#x27;s usage internally by ensuring it is accessible and easy to use and guide other team to a more data oriented mindset.<p>I feel that this is a specialised position not exactly similar to a developer, but every time I look for &quot;data scientist&quot; I get guys that want to do machine learning prediction models, which is not exactly the same stuff either.
评论 #27779264 未加载
评论 #27778656 未加载
评论 #27778785 未加载
评论 #27778615 未加载
评论 #27779833 未加载
评论 #27778641 未加载
评论 #27779301 未加载
评论 #27778713 未加载
评论 #27778821 未加载
评论 #27779083 未加载
评论 #27778682 未加载
gumbyalmost 4 years ago
Great article. The confusion about what team does what is priceless...yet so common!<p>To provide some sympathy for the folks already working there: you always replace systems well <i>after</i> you&#x27;ve overrun them.<p>When the ad hoc system works (consider that google spreadsheet at a time when there were three support people and perhaps a dozen customers) you&#x27;re not going to decide to replace it with something more complicated. Then you&#x27;re busy growing so you just keep the system going through sheer force of will. You only replace it when the effort is unbearable; at that point you say, frustratedly, &quot;I wish we&#x27;d done this sooner.&quot;
correlatoralmost 4 years ago
Thank you for writing this. I personally just walked into a very similar role and this rang really true. This article made me realize how much more effort I need to put into the data culture side of the role.
simonwalmost 4 years ago
&quot;This is basically a (somewhat cynical) depiction of things that may happen at a lot of companies early in the data maturity stage&quot;<p>I don&#x27;t think this is very cynical at all! Feels pretty accurate to me.
herodoturtlealmost 4 years ago
For the last 15 years I&#x27;ve been building (what I consider to be) accessible database solutions, for a bunch of different industries.<p>This sentence from the article resonated with me:<p>&gt; You&#x27;re starting to lay the most basic foundation of what is most critically needed: all the important data, in the same place, easily queryable.
roystonvasseyalmost 4 years ago
This is a perfect encapsulation of my career as a data-guy square peg in a round hole, filled with jargon and misplaced understanding of data in general.<p>Despite all that you read and hear about data science advancing, you’ll be surprised to see how poorly leveraged, or worse, billions of dollars are sought to implement the latest tool that promises to change the world. Tech and data as we imagine it be in the FAANG kind of companies is far different than how it is in older industries. It’s not just systems that need upgrading, company cultures do and that’s never an easy or fast process. I’ve been in the data Analytics space for 16 years now and I still feel, more often than not, I’m part of the minority, working to demonstrate true data use-cases
cobertosalmost 4 years ago
Part of me wonders what the long term of a transition like this looks like. Would this company be able to keep its data consumption healthy, or would it drive product changes that might harm it&#x27;s users or lead to dark patterns?
Artgoralmost 4 years ago
When I had started reading this article, I had thought that it would be a sad story about another startup failure. The blogpost turned out to be a fascinating story of the success. I really liked it.<p>But after I had finished reading it, I have realized that it is a sad story, if we look from the eyes of data scientists in the team. People were hired to do cool machine learning projects, but it turned out there is no infrastructure for them. After the new boss had arrived, they had to work as analysts for months. What is more sad - the new boss dangled a carrot before them several times, but each time the carrot disappeared.
评论 #27791946 未加载
ttzalmost 4 years ago
&gt; MBA types<p>I chuckled. Then cried, because at least his MBA types can use SQL. My MBA types use Excel.<p>OT: Good article. Like and agree with the push for centralizing data first, then building outwards so external teams can move towards self-service.
评论 #27778527 未加载
评论 #27778602 未加载
te_chrisalmost 4 years ago
This is a good write-up, but for the sort of insights they’re getting they’re over staffed and overpaying. A combination of a cloud dw (big query, e.g), cloud etl (stitch, fivetran) and dbt for the T in ELT to build useful reporting tables, along with some sort of sql based BI (mode, in our case), could deliver the same insights for a fraction of the price. Throw in a sub to Heap or similar for ad-hoc product analytics as a cherry on top.<p>I concede, of course, that they’re rescuing a bad situation, not starting from scratch, but still.
jabagonutsalmost 4 years ago
Really enjoyed this narrative, but what about the next phase? Going from mid-stage to mature startup?<p>&gt; Note that you took on a lot of “tech debt” earlier when you started dumping the production database tables straight into the data warehouse.<p>How do you manage expectations when the year-long honeymoon is over, the business grows tremendously, and the centralized data warehouse reaches a breaking point?
评论 #27780516 未加载
neighbouralmost 4 years ago
Excellent article. For me, the timing couldn&#x27;t be better as I am about to step into a role not too dissimilar to the one described in the piece. It will be interesting to see if I run into many of the situations the author describes.
AtNightWeCodealmost 4 years ago
I really enjoyed reading this. Very well written. At companies I worked teams can never read data from the DW btw.<p>My experience with A&#x2F;B tests is that they are way overrated.<p>On the poor data quality. You sit on a product like a call center. Frontend developers thinks it is an excellent idea to store all data in some doc db blob. Then business wants stats about number of calls based on users...<p>Be careful when putting tabular data into doc dbs.
babublacksheepalmost 4 years ago
Extremely relatable content throughout. Especially around teams beating their own drums while CEO questions around metrics. ;)<p>Will wait for a follow up post on how decentralised data team created data silos and how we solve it using data discovery and data standardisation. :P<p>Disclaimer: I have built decentralised data teams and it scales well.
civilizedalmost 4 years ago
Wow, a story where things start out a mess and end up a lot better! Can we write one of these for society too?
评论 #27829859 未加载
tsrezalmost 4 years ago
It&#x27;s such an interesting and valuable article on building a data team, esp. insightful for organisation starting out. Guess the challenges in traditional&#x2F;larger companies starting out a data team might look slightly different.
soumyadebalmost 4 years ago
Such a great read. Have been in this position in a large public org. Over a year was spent just creating a catalog of what all data the company has and figuring out how to pull them into a data-warehouse
spicyramenalmost 4 years ago
Can correlate, author is a truly a genius. We had a company mandate to be ML first, we went through a lot of phases and so many conversations happened as described in this amazing piece. Thanks Erik
mindvirusalmost 4 years ago
This is a wonderful article, thank you for sharing. I really like the narrative of bringing people with you on the journey, and celebrating the small wins that lead to a good long term outcome.
oliv__almost 4 years ago
No snark implied but what a great ad for the author!<p>This was very fun to read, and an interesting window into the processes and inner workings of a startup that size.
div3rs3almost 4 years ago
Done well (like here), The Goal like storytelling, is both educational and interesting.
nerdponxalmost 4 years ago
This is an incredibly valuable writeup. Great job.
waynesonfirealmost 4 years ago
TLDR, refine your thoughts.
评论 #27779041 未加载