This is pretty cool.<p>When people say to have a github repository, I often worry that people think they have to have some huge project. Like a fork of Node or something.<p>This analysis would be more than enough for me to give someone an interview.<p>The math isn't complex, the analysis is pretty shallow but it shows that the author knows their way around the basics of:<p><pre><code> - finding data, this is often the hardest part of data analysis
- working with data, unpacking, storing, retrieving, etc
- basic analysis with R or python
</code></pre>
and to be honest, this counts for alot!<p>As a first introduction to a potential employee, this is more valuable than having a good resume and its well within reach of most people, regardless of how busy you are!<p>TL/DR, don't overthink having a public github account. Basic analysis like this will put you above most other candidates, oh and good job to the author!
This is a really good write up and analysis.
I'd like to point out a couple of things: the subprime market has moved to FHA and Ginnie Mae securities. I'm not sure if they have the same detail of loan level data available online, but it would be interesting to analyze.<p>The other thing is that securities, mortgage loans themselves, securitizations and their derivatives will not trade in the market at fundamental value. This type of fundamental analysis is great, and you can make a lot of money by understanding value better than your competitors. When the market goes crazy, your portfolio market value can fall far below values calculated with these models. If that happens and you've leveraged your portfolio, you will go out of business. I know it seems like stating the obvious right now, but I saw a lot of very smart people who were caught in this trap.
I was asked to do an analysis of Fannie and Freddie during a job interview in 2001. A 3 page report of 2 institutions I'd never heard of before, with a stack a papers around 3 feet high consisting of a variety of financial statements, promotional material, and news clippings, to be completed in pen within 3 hours.<p>Not being from the US, unaware of these institutions, and boggled how the concept of the state backing fixed rate mortgages was sensible, I wrote my 3 pages and somehow got the job.<p>> It should not be overlooked that in the not-so-distant past, i.e. when I worked as a mortgage analyst, an analysis of loan-level mortgage data would have cost a lot of money. Between licensing data and paying for expensive computers to analyze it, you could have easily incurred costs north of a million dollars per year.<p>If it existed. It did not. Computers were not needed to analyse a nice big data set, because a nice, big, transparent, data set, did not exist. Those that dug did quite nicely realizing that a big data set didn't exist did so by digging themselves, being confused, and realizing everyone else was confused / delusional too.<p>Splitting things by state and making data available is a level in transparency. But it is fine-tuning an organ based on where the horn is, and not understanding what the notes played are.<p>Providation of this type of data is badly stitching a bad gash. It confirms what has been known for years. A better question would be "If you're issuing bonds based on loans to people you have a FICO 'thin file' score of 600 for, that you've not done basic background checks for, and they're seeking to borrow 10 times their annual income, don't you see something wrong?"<p>Basic questions and understanding underlying data are more important than optimization of headline metrics.
I work in the mortgage industry and have analyzed large datasets of subprime and alt-a mortgages. These findings are very consistent with mine, although the default and severity rates are (obviously) even worse than conventional.<p>Freddie is a bit behind in their dataset, only offering data through 2013. IMO, this kind of defeats their effort to increase transparency. If 2014 vintage loans are performing much worse (or better), it won't be known in time for many investors/modelers to react.<p>I also wish GinnieMae would release loan level data like this for FHA/VA/USDA loans, which are a huge part of the market. I could only find MBS pool aggregated data on their web-site: <a href="http://www.ginniemae.gov/doing_business_with_ginniemae/investor_resources/mbs_disclosure_data/Pages/monthly_consolidated_data.aspx" rel="nofollow">http://www.ginniemae.gov/doing_business_with_ginniemae/inves...</a>
Mortgages get disproportionately low airtime in the startup world, which I've always thought was strange, especially considering how significant they are to the US (and global economy).<p>Check out LendingHome (<a href="http://lendinghome.com" rel="nofollow">http://lendinghome.com</a>) if you're looking for an awesome company in SF that's doing some really cool work in the space.
It was unclear to me at first that a default rate of 0.4 on the map is actually 40% !!!<p>I had no idea it was that high, and I just assumed it meant 0.4% until I saw the numbers later.
> So-called agent-based models attempt to model the behavior of individual borrowers at the micro-level, then simulate many agents interacting and making individual decisions, before aggregating into a final prediction. The agent-based approach can be computationally much more complicated, but at least in my opinion it seems like a model based on traditional statistical techniques will never explain phenomena like the housing bubble and financial crisis, whereas a well-formulated agent-based model at least has a fighting chance.<p>Can anyone unpack this a bit? By my (fuzzy) understanding, this was something a lot of people thought in the 80's with neural networks but there wasn't a lot of theory to back it up. Later, applied math people introduced the kernel SVM which could solve non-linear problems with power equivalent to neural networks [0]. RNNs are back in style now (and a lot more theory has been developed), but is this the type of agent-based model that would be useful for this problem and why so?<p>[0]: <a href="http://www.scm.keele.ac.uk/staff/p_andras/PAnpl2002.pdf" rel="nofollow">http://www.scm.keele.ac.uk/staff/p_andras/PAnpl2002.pdf</a>
Direct link to Github repository: <a href="https://github.com/toddwschneider/agency-loan-level" rel="nofollow">https://github.com/toddwschneider/agency-loan-level</a>
It would be interesting to cross-reference the records from Freddie and Fannie with the HMDA data, which has additional fields about each mortgage application: <a href="https://www.ffiec.gov/hmda/hmdaflat.htm" rel="nofollow">https://www.ffiec.gov/hmda/hmdaflat.htm</a>
Would the HMDA "loan amount" field match the "ORIGINAL UNPAID PRINCIPAL BALANCE" field in the Fannie data? Since HMDA data is geo-located to the Census tract, it could then be linked to Census and other public data sets.
Thanks for the detailed analysis! It's very thorough and really interesting, and it's awesome that people are making intelligent use of this data now that it's available.
Private securitizations were inflating the subprime bubble well before Fannie and Freddie jumped in. If there's any data on, e.g., Countrywide/IndyMac it'd be valuable to add.