There was another article recently which argued that because there is a strong correlation between race and default rates, if you apply machine learning to a dataset, the algorithm will find a way to extract what is basically a proxy for the race from the data.<p>So basically any sort of ML applied to credit data will run afoul of the equal credit opportunity act.<p>The article also made the point that basically all ML credit scoring startups are illegal because of this, but they get away just because they are small and not on the radar.
The book Weapons of Math Destruction talks all about this. I've come to believe that pure risk shouldn't be the only factor in a person's interest rate.<p>That obviously makes a ton of sense from a business standpoint. You want to contain losses for risky borrowers but compete with other lenders for low risk borrowers.<p>But socially, this is perverse. People tend to be risky because they are already poor. So now money costs more for those who have the least of it. This is one of the feedback loops that makes poverty (and affluence, for that matter) so sticky.<p>I had this realization in my personal experience when I was able to refinance almost $100k in student loans at a crazy low interest rate. My household's finances are in great shape as my wife and I enter our prime earning years. But for us, such an opportunity is a gift, on top of an already sweet situation. The savings could be a game changer for a family whose finances are more marginal.
Problem with utilizing datapoints like digital footprints is that it will run afoul equal credit opportunity act. ECOA was designed to stop banks from redlining neighbourhoods which usually punished minorities. With digital footprints, they'll be in theory redlining their digital including sites, products purchased etc.
The attributes, from the paper:<p>The device type (for example, tablet or mobile), the operating system (for example, iOS or Android), the channel through which a customer comes to the website (for example, search engine or price comparison site), a do not track dummy equal to one if a customer uses settings that do not allow tracking device, operating system and channel information, the time of day of the purchase (for example, morning, afternoon, evening, or night), the email service provider (for example, gmail or yahoo), two pieces of information about the email address chosen by the user (includes first and/or last name and includes a number), a lower case dummy if a user consistently uses lower case when writing, and a dummy for a typing error when entering the email address.
"We analyze the information content of the digital footprint – information that people leave online simply by accessing or registering on a website – for predicting consumer default."<p>Wonderful.
Gameable and dystopian all at the same time.
The baseline (FICO) has AUC of 68.3% which looks low. This may be because the analysis is performed not on the entire through-the-door population, but only on the customers that passed the creditworthiness check (which is using FICO).<p>In such situations it is customary to do some kind of reject inference or testing below the cutoff, as well as swap-in and swap-out analysis. It does not look like they did any of that.
I lead the Data Science team at Oakam, a London-based fintech company founded in 2006.<p>If you find the article interesting, you may also be interested in Oakam's work using alternative data to predict credit default, which was covered recently in The Economist:<p><a href="https://www.economist.com/special-report/2018/05/03/mobile-financial-services-are-cornering-the-market" rel="nofollow">https://www.economist.com/special-report/2018/05/03/mobile-f...</a><p><a href="https://www.economist.com/special-report/2018/05/03/mobile-financial-services-are-cornering-the-market" rel="nofollow">https://www.economist.com/special-report/2018/05/03/mobile-f...</a><p>If you're a Data Scientist looking to work in this area, or just looking for a new challenge, please contact me (personal email in my profile) so we can have a chat!<p>We are also hiring software engineers (stack is React Native for iOS/Android, and mostly C# for everything else).