I build statistical models for banks which help assess the risk of a loan. Effectively, my models will get converted into the grades (A, B, C, D, etc.) mentioned in the article. The strategies (second chance, family guy, safe haven) are generally consistent with experiences from the portfolios of most financial institutions.<p>However, I am skeptical (prove me wrong) of the statement in the article - "Lenders get a return on their investment that is typically much better than traditional Certificate of Deposit or Saving Accounts". In finance terms, I will be surprised if they have a higher RAROC [1] as compared to large banks. If they really do, then congratulations (you will put banks out of business in a few years)??<p>[1] <a href="https://en.wikipedia.org/wiki/Risk-adjusted_return_on_capital" rel="nofollow">https://en.wikipedia.org/wiki/Risk-adjusted_return_on_capita...</a>
The employment length is really bugging me. I've always selected people with a few years at their current job, leaning towards higher, because it feels safe, but this says that <2 years of experience is better than longer! I wonder if they're "newer" so more likely to stay around and not be pushed out, or if the rates are much higher compared to a marginal increase in risk. I'm leaning toward the latter. It looks like income has the same effect for home loans; <50k has a much higher return simply because they get a huge rating hit.<p>My other big hit is 3 versus 5-year terms. Anyone here care to comment? I like the 36 months because it feels more liquid and when I started I wasn't sure LendingClub was going to be around for a decade or more. Beginning to think I should reconsider that stance.
Thanks Clement for this beautifully simple dc.js dataviz. How long did you play with it before finding the pearl? Do you think there are yet other pearls to find in your tool?
Very useful tool. Gives valuable insight on how to select filters in portfolio construction. If "the Pearl" was a existing product, I would definitely invest in it.
I operate an online crowd-lending analytics and automation platform PeerCube https:/www.peercube.com. I have been analyzing both Lending Club and Prosper data for my institutional clients for almost 4 years now. While OP made a good first attempt on analyzing the data, the analysis suffers from two major shortcomings that I normally see from people getting started with data analysis.<p>1. Domain Knowledge: Novice analyst tend to put the data in a blender and see what comes out first instead of building some preliminary knowledge and intuition about the domain. This is quite evident in OP's analysis and finding about annual income. A person familiar with domain will ask the question "Why would a borrower with high annual income will borrow a small amount loan at high interest rate?" This right away will raise flags about risks of lending to such borrowers. OP will benefit by reading some of the publications (books, research) on credit scoring and modeling before deep diving into analyzing Lending Club data.<p>2. Data Exploration: Not spending enough time exploring the data can lead to erroneous conclusion like The second chance strategy. When did Lending Club start issuing loans to borrowers with delinquencies and public records has a big impact on returns as newer loans are not aged enough to have sufficient defaults.<p>> Watch for your average return (expected return), consistency of returns through time (risk), while making sure there is enough supply (liquidity) on the platform to deploy your strategy.<p>Time is not Risk. You need to find a proper measure for risk. Also consider negative kurtosis and frequent low positive returns but a few high negative returns nature of return distribution.<p>> I considered that investors deploy and re-invest their money continuously on the platform and therefore own a portfolio with different ‘vintages’ of loans. The ROI that are computed reflect this, as they are average returns across vintages.<p>Re-consider this argument of "average return across vintages" being representative of investor returns. Tip: look at loan volume across vintages as well as typical re-investment pattern of a typical investor.<p>> Please also note than due to the low issuance volume in the early days of the platform, the returns computed for the pre-2010 period are much less reliable than the post-2010 returns.<p>Please don't do this. The data between 2006 and 2010 is the most valuable due to the business cycle we were in at that time. The data since 2010 tells nothing about how loans might perform in the future when business cycle is not as good it has been in last few years.<p>OP will really benefit from re-evaluating his finings with critical eyes. I will suggest gaining some domain knowledge, spending lot of time on just exploring the data before start drawing definite conclusions, focusing on distributions, correlations and statistical significance.