(I wrote this article)<p>We recently wrote an article (<a href="https://l.bit.io/o-cop26" rel="nofollow">https://l.bit.io/o-cop26</a>) about methane emissions and the COP26 commitment to cut emissions. During the writing of that article, we found some serious inconsistencies in some of the data sources.<p>Discussions of data quality and validation in data science tend to end with recommendations for a few data validation checks, such as making sure data come from trusted sources; handling missing values; and investigating outliers. These sorts of checks are important, but they won't save an analysis from perfectly-formatted data from a trusted source that happens to be wrong for reasons that can't be found in the dataset itself. Even data of apparently good quality can lead to faulty conclusions.<p>This article delves into this question by exploring a case study. The U.N. publishes greenhouse gas emissions data supplied each year by parties to the UNFCCC (United Nations Framework Convention on Climate Change). The data are consistent, up-to-date, and well formatted, and the U.N. is a reliable source of official data. However, there is good reason to believe the data submitted by some countries is not accurate. There are other trusted data sources that show startlingly large differences from the U.N. data. In particular, we found that Russia's Methane emissions data were highly inconsistent with the World Resources Institute (WRI) Climate Analysis Indicators Tool (CAIT) data, even though these data were quite similar to the U.N. data for other countries.
The Washington Post article referenced in this post is really interesting: <a href="https://www.washingtonpost.com/climate-environment/interactive/2021/russia-greenhouse-gas-emissions/" rel="nofollow">https://www.washingtonpost.com/climate-environment/interacti...</a><p>The yearly revisions of the GHG emissions estimates are really striking.