I'm looking for resources to learn about development of a wholistic data strategy for large organizations (>1,000) people - to store data, share data, create automatic/realtime reporting?<p>Starting from a very low base - I've learnt to mockup R Notebooks to analyze data sets and used PowerBI to create (somewhat) informational dashboards and KPI measurement for organizations.<p>E.g. For an organization with +10,000 users, what options are available to setup datastores/share and develop pipelines for a large organization - hosted/local. E.g what tools are used? Whats industry best practice? Whats open source vs proprietary? Security implications/trade-offs?<p>Any directions helpful. I've briefly looked at Power BI and its hosted platform but looking for what else is around.
A common stack in industry now is:
Fivetran (or Stitch) -> Snowflake (or BigQuery) -> Looker (or Mode)<p>For an organization with 1000+ employees you are going to be paying a lot for whatever service you use, which means that you will get a lot of support from the vendor you go with and you don't have to try to answer all of these questions ahead of time. Plus, you are going to hire someone on your team to manage this, it's not something that someone just manages on the side.
At >1000 the ordinary resources is consultants because person-centuries of experience is usually warranted at that scale. Consultants are also the best way to get in-house staff up to speed quickly because they will participate in specifying the system and learn what are the right questions to ask and important factors to consider. Good luck.