The second graph in the WP tells the whole story, and it's not a complex one:
A greater proportion of black offenders were in fact high risk, than white offenders. So naturally the "fair" and sort-of-accurate test flagged a greater percentage of the set of all black offenders as likely reoffenders, and this had the obvious consequence that there were BOTH (proportionately) more false positives amongst blacks AND (proportionately) more true positives when black offenders were tested, too, 'cause there were just plain more positives, proportionately, and would have been in any accurate test.<p>One could fiddle the test so that the percentage of false positives was the same for black offenders as white offenders, but only by reclassifying some black high-risk offenders as low; which would make the test less successful (less predictive) in sorting black offenders into high and low risk groups.<p>Somewhat similarly, if you tax rich people and poor people by an income tax you have to choose to be "fair" by charging them each the same number of dollars; or "fair" by charging them the same percentage of their income. You can't be "fair" by both measures at the same time because their incomes aren't the same. Not rocket science.
TLDR: There are two different intuitive definitions of predictive fairness. Given populations with different base crime rates, it is impossible to meet both fairness criteria.<p>If you can survive the WP payroll, the clearest article I've found on this is here:<p><a href="https://www.washingtonpost.com/news/monkey-cage/wp/2016/10/17/can-an-algorithm-be-racist-our-analysis-is-more-cautious-than-propublicas" rel="nofollow">https://www.washingtonpost.com/news/monkey-cage/wp/2016/10/1...</a><p>The full paper is here: <a href="https://arxiv.org/pdf/1610.07524v1.pdf" rel="nofollow">https://arxiv.org/pdf/1610.07524v1.pdf</a>