SO many angles on this topic -- aside from the professional and clearly competent article on data engineering .. but I wont say 'modern' engineering, to underscore the following..<p>The personal computer became wildly popular decades ago, by empowering <i>individuals</i> (not feeding dark or dysfunctional central-server patterns that existed at the time). The technical differences in designing what used to be Quicken, and what is now Quickbooks, underscores the larger change from user-centric to streams and server-based software.<p>When the Intuit/Quicken products were first implemented decades ago, it was a <i>user-centric</i> software problem.. the Graphical User Interface (GUI) meets the human user with context and goals, which executes on the Operating System that the software sits on .. The engineering involved required accuracy and consistency for the purpose of <i>human</i> activities.<p>Fast-forward more than twenty years, and this engineering is centered on tens and hundreds of thousands of 'streams' to a central service. The smarts are going toward the categorization, classification and filtering of streams, for the purpose of the whole.. much more like an ant colony or similar.. where the individual user is not at all the point, and in fact is disposable to some extent. Many, many corollaries are possible here..<p>Again, great work by the engineering teams and this author, however, it is not at all certain that the enterprise, law-enforcement and oversight here is trust-worthy over time. History has shown humans to do bad things to other humans, for many reasons. Putting money flows into concentrated streams like this creates efficiencies, and is also highly susceptible to manipulation, not at the moment-to-moment data ingestion side, but rather at the long-term management side.
Am I the only one who finds it concerning that they capture like everything ("Data entered by customers in using the products" and "Clickstream data capturing usage of the product") and persisting it just for the sake of having it and finding a use for it later?
Also they collect data directly from 100s of relational databases, that sounds like a terrible idea to make your DB schema an API for data collection.