Hi HN!<p>In my previous life at tech at BigCo, I was always given data from providers(bloomberg, reuters, etc) and processed it using my models.<p>I was asked recently how do you trust data that may not be audited and is self reported? i.e. say a company reports the number of women in the company or enviromental metrics.<p>I feel this is a general problem for any self reported data. How would you handle it?
In financial accounting a public company (say General Motors) will create aggregated financial statistics that they publish in their quarterly reports.<p>They hire an accounting firm (say Deloitte) in to check their work by looking at some sample of their documentation. This is a lot like Deming-style quality control; they look at some fraction of the checks that came from car dealers, or that were cut to parts suppliers and see that the story makes sense.<p>In fields like insurance where fraud is particularly dangerous they do things like look to see if there is a real person for some of the policies, etc.<p>I would look to the same model for other kinds of accounting too.<p>For instance, if the company said that 42% of its employees are women they could let a third party look at a sample of 1000 employees that the third party chooses, going so far as letting the third party see employement records filed with the state, contact those employees, etc.<p>Like a public opinion poll it is not an exact answer, maybe they will find 40% or 44% of the employees are women, which is close enough.<p>That is just one trick in the toolbox that accountants have, sometimes they will see a bunch of deposits with round numbers ($7700) and then you get one for $345.34 and that is the one they ask you about.<p>So that's the subject you should be looking up, the kind of people you should be talking to.