TechEcho

In a fast growing (and changing) company, I've recently joined a team responsible for the data pipeline that makes data available to analysts. The source data of the pipeline are events in JSON format that land in S3 after going through Kafka. Some of the event types are pretty complex with many levels of nested fields and lots of repeated ones.Most people in the team are familiar with standard DW techniques but nobody has experience with AWS-based, event-driven data pipelines and that translates into uneducated decisions which may prove to be troublesome really soon:* Do Kimball techniques still apply in 2020? Is this how people who know how to use cloud platforms build their reporting pipelines?* How do experienced teams deal with nested and repeated fields? Is it a good strategy to build a normalized version of the data?* What happens to old events when there is an architectural breaking change in the source system that results in event breaking changes? Is it expected to have reports that break frequently in a fast-changing environment?* If standard DW techniques are still the way to go, do data scientists need to use the DW in their analyses or have direct access to raw (but clean) data?* How can a team learn cloud-based data pipeline best practices when they have no connections with experienced people?I've read more than enough articles and books on the subject but still it all feels very theoretical. Any advice more than welcome.

Ask HN: Do FAANG use Kimball techniques for operational/analytical reporting?

no comments

Ask HN: Do FAANG use Kimball techniques for operational/analytical reporting?

no comments