Interesting stuff...<p>A further point to consider if we're talking about real world applications is that it is not actually established that markets have finite variance - seriously.<p>In the 1960's, Benoit Mandelbrot began his research into chaos and fractal by looking at markets and finding that non-Gaussian, Levi-Stable distributions modeled changes in the market best[1]. And these L-stable distributions don't generally have a finite variance and sometimes don't have a finite mean [2].<p>And it is fairly easy to see how a market tends to not be Gaussian; change based on a Gaussian distribution tends to be random walk a la Brownian motion, where the final position of a variable is the sum of many small changes in the variable. Non-Gaussian, infinite-variance-distribution-based movement on the other hand has the property that the final result of a variable tends to be more the result of a finite number of large changes rather than a lot of small changes. And this is what the stock market often looks like. A few wild moves often impact things as much as the incremental changes. The apparent mean, variance and distribution of stocks on a day-to-day basis may not pan out in extreme situations and these can eat away the rest of your profits. If the stocks that seemed independently in normal conditions all go down in crash, your estimated-correlation-based-diversification hasn't protected you very well.<p>The Black Swan is a sadly too-simplified popular summary of these points.[3] It does point to the general idea. The higher-level take-away is that infinite variance distributions exist and indeed, you can not apriori assume a given distribution you are working with isn't one.<p>[1] <a href="http://books.google.com/books?id=6KGSYANlwHAC&lpg=PP1&ots=yULs5p13Uo&dq=Benoit%20Mandelbrot%20fractal%20and%20scaling%20in%20finance&pg=PP1#v=onepage&q=Benoit%20Mandelbrot%20fractal%20and%20scaling%20in%20finance&f=false" rel="nofollow">http://books.google.com/books?id=6KGSYANlwHAC&lpg=PP1...</a>
[2] <a href="http://en.wikipedia.org/wiki/L%C3%A9vy_distribution" rel="nofollow">http://en.wikipedia.org/wiki/L%C3%A9vy_distribution</a>
[3] <a href="http://en.wikipedia.org/wiki/The_Black_Swan_(Taleb_book)" rel="nofollow">http://en.wikipedia.org/wiki/The_Black_Swan_(Taleb_book)</a>
I'm an analyst for a quantitative hedge fund. Please, <i>please</i> everyone promise me to never base your investment decisions on this discredited form of mean-variance optimization.<p>This method of stock selecting was invented by Harry Markowitz in 1952. In the intervening sixty (!) years we have accumulated overwhelming evidence that plain vanilla mean variance optimization doesn't work. Among its many flaws:<p>1. It makes unrealistic assumptions about the distribution of returns (i.e. that they are multivariate normal, when it is well known that returns exhibit heavy tails, time-varying volatility, fluctuating correlations etc etc).<p>2. It relies on you having good estimates of the expected annual return of individual stocks. How do you propose to get these? Don't say you'll use historical measurements, unless you really believe that last year's return is a good predictor of this year's return (it's not, except perhaps in some sectors, and even then it's difficult to measure and you'd be subject to crash risk).<p>3. The optimization procedure is error-maximizing. That is, even if returns <i>were</i> multivariate normal <i>and</i> you had a reliable way to measure the expected return on stocks, you'd still have errors in your covariance matrix, and these errors are amplified by the optimization procedure. You can see this in the article, then the "optimum" portfolio recommends putting 75% of your portfolio in MSFT and shorting AMZN and AAPL. Does anyone really believe that's sensible? Does anyone believe that such a portfolio is diversified?<p>The problem is that your model of stock returns is subject to massive overfitting. Let's say you have data for the last 10 years (i.e. about 2500 days). If there are N stocks in your portfolio, you need N(N+1)/2 pieces of information to specify the covariance matrix, which puts an upper limit of 70 stocks in your universe (since 70 * 71 / 2 ~ 2500). A good rule of thumb is that you should have 10 observations per free parameter, which cuts that number down to 22 stocks (22 * 23 / 2 ~ 500). I think that most portfolios consisting of 22 single-name stocks aren't sufficiently diversified (and you'll still be subject to the first two problems above).<p>In 2012, <i>no one</i> should be using mean-variance optimization to select stocks. At the very least, shrink the covariance matrix toward some sensible prior (e.g. constant correlations, sector correlations, or a factor model) and backtest your strategy over the past 10-20 years and look at the annual volatility, size and length of drawdowns, skewness and information ratio.
For your sake, do not base any investment decisions off of this model. Historical correlation != future correlation. You are MUCH better off using the Fama-French three factor model as a starting point.<p>An example of historical correlation severely understating risk was the 2008 financial crisis. Default rates of mortgages in, say, Florida that historically had little correlation with default rates of mortgages in Nevada suddenly became very correlated. Measuring risk in this fashion is not robust enough for investment decisions<p><a href="http://en.wikipedia.org/wiki/Fama%E2%80%93French_three-factor_model" rel="nofollow">http://en.wikipedia.org/wiki/Fama%E2%80%93French_three-facto...</a>
Firstly, nice effort.<p>Secondly, some features you could add:<p>1. Constrained optimization - including budget constraints, sector selection constraints etc. A tough one would be cardinal constraints e.g. I am limited to 4 stocks etc.<p>2. Return attribution - whether the returns your portfolio earned were due to stock selection or asset allocation or both (Brinson Model: <a href="http://www.mscibarra.com/research/articles/2002/PerfBrinson.pdf" rel="nofollow">http://www.mscibarra.com/research/articles/2002/PerfBrinson....</a>).<p>3. Performance and compression - how would this deal with huge covariance matrices? 10000 x 10000? Matrix operarions on these wouldn't be trivial. In-memory serialization/deserialization issues also come to mind. (edit: then again, Excel can't do 10k x 10k :) )<p>4. I'm not conversant with SciPy - does this use BFGS/similar for optimization?<p>5. Compute as a service? Host a grid? Let calculation requests come to you via Excel? (Nobody would want a 10000 asset timeseries to be processed on their CPU for two hours).
It looks like the optimal weights for AMZN and AAPL are negative. That doesn't seem possible unless you're shorting them, but that's a quite different risk profile than going long.
Decent article, but I wanted to add a little extra warning to this:<p>>One way to do this is to look at past returns and come up with the historical correlation.<p>Be very wary of historical correlations, at any level. I am old enough (I was in my early teens) that I can remember the screaming of economists during the 1970s stagflation - it was <i>known</i> that you could not have high inflation and high unemployment at the same time. Until we did.
I may be totally off the mark here (it has been 5 years since I last studied portfolio management) but if you have 2 stocks that are totally negatively correlated, with the same expected return, won't you have a return of 0%?
just a small nitpick, you should be able to calculate portfolio variance using simple matrix operations instead of writing a double for loop.<p>something like.<p>temp = a* std_dev<p>var = np.dot(np.dot(temp.T, cor), temp)