Another recent paper on this topic: <a href="http://arxiv.org/pdf/1606.08813v3.pdf" rel="nofollow">http://arxiv.org/pdf/1606.08813v3.pdf</a>. It shows how naive lending algorithms can skew against minority groups simply because there is less data available about them, even if their expected repayment rate is the same.<p>It can be self-reinforcing. Imagine some new demographic group of customers appears, and without any data you make some loans to them. The actual repayment rate will be low, not because that group has a worse distribution than other groups, but simply because you couldn't identify the lowest-risk members. A simplistic ML model would conclude that the new group is more risky.<p>Of course, smart lenders understand that in order to develop a new customer demographic they need to experiment by lending, with the expectation that their first loans will have high losses, but that in the long run learning about how to identify the low-risk people from that demographic is worthwhile. And they correct for the fact that the first cohort was accepted blind when estimating overall risk for the group.