I'm a data engineer at a large corporation. At my current company, we use Pentaho to extract and transform our data from Oracle daily. The transformed data is loaded to a staging database where we then model it to a star-schema. Then the final results are bulk loaded to an on-prem DW. The process takes hours to complete.<p>I’m interested in moving to a model that continually extracts changed data to a data lake, then using the power of a cloud data warehouse to read those files and perform the transformations and modeling in SQL. I guess that's the ELT concept that you mentioned in the book's summary.<p>Goal being to reduce the latency and allow for the possibility of more frequent batches, as well as making the process more accessible to my team with strong SQL skills and being able to adapt faster to changing business needs.<p>This book looks like a good foray for me to get a glimpse into those new process. Thanks for putting it together.
Really well done! The content looks like a great balance. I'm an engineer working a bit in data engineering, and I find the content of the book relevant to me.<p>I like how you explain why the methodologies of the pre-cloud era still have lessons learned to apply to today, but implementation best practices have changed thanks to the cost model of the cloud.<p>The section that stood out the most to me, strangely, was not anything to do with the technology or analytics stack. It was in Chapter 3 – Data Modeling Layer and Concepts where you discuss the dynamic between the CEO and the data analyst and the data. This really articulated quite well how our own dynamic functions at our current company. Even with our current data warehouse, our BI team is a bottleneck, and it is something becoming more and more apparent to me. It is my primary motivation in seeking out how best to re-architect our analytics stack.
This is just the book I need!<p>A little bit of context: I am a product manager and I have been working with data analysts and engineers for a few months, and even though I have tried to do a lot of research, sometimes I still don't understand what they said.<p>Terms are extremely difficult and varied depending on the site, and it seems like each company will have a different perception for one term.<p>So that's where this book comes in handy. It helped me visualize the big picture of the whole data analytics landscape. What's more, I understand what the role and challenges of the data analysts and data engineers in my team are. I was able to communicate with them in their "language", especially when I was explaining why we should use ELT instead of ETL (Chap 3, I suppose)<p>Anw, I think this book is great for non-tech people like me, but it requires certain experience in the tech industry to get started with. Definitely recommend for other PMs who will be working with data people!
I've been looking for something like this for a while. Most of the time when I go online to search for resources on building analytics stack, most of the content is biased towards the vendor's preferred way of doing things. This looks like it will give me a high-level understanding to the why of the proposed approach.
Amazing that we can get such high quality resources for free. That said, I'm not convinced by the example where the CEO uses the "data modelling layer" (essentially what holistics offers to build for you and what this book is an ad for). In my experience a good data analyst does far more than "translate" the business question to SQL. The exec's understanding is not only limited by not knowing SQL, but also by potential confounds or a billion other things that can make a seemingly meaningful result meaningless or dangerous.<p>I don't think simpler technology can give people the magic answers and easy data access that they crave any more than no code tools can let people build complicated <i>correct</i> systems
Hey HN. This is something we've been working on for the last three months over at Holistics.<p>If you're a data analyst, data engineer, or a founder setting up a data analytics stack for the first time, this is a book that will give you a soup-to-nuts overview of an entire field.<p>Like most books about data analytics, this assumes some amount of technical competence.<p>Unlike most books in the space, this is mostly about first principles. About the ideas behind the tools, not the tools themselves.<p>The hope is to give you 'just enough to not get lost'. And the book is written to be read within 2 hours of reading — in some cases, no more than two sittings!<p>There's probably more than a hundred hours of research and writing that went into this. I'm looking forward to read your comments.
Great book and spot on of the problem statement. One interesting note to point out is I didn't see any mention of testing your data models or version control? This should be part of the process of the modern analytics stack to ensure data quality.<p>A suggestion of approaches and tools could be useful. Whether it's via tools such as dbt for expected field values or with frameworks such as GreatExpectations. What happens to data that doesn't conform to expected values? How should you handle it? and how can the testing process be automated? This forms an important part of ensuring data quality and reliability of the analysed output.
Nice, the illustrations look pretty good. I just took a look at the table of contents, it seems to cover a lot of my questions about data analytics for a product guy like me, will spend some time reading this weekend.<p>Sending to my data team btw, thanks for sharing