Machine learning for financial prediction

214 pointsby Matetricksalmost 9 years ago

14 comments

mathgeniusalmost 9 years ago

It's just so ridiculously easy to overfit these models, and so so many ways to shoot yourself in the foot as a result.For example, "I split the data set into 5 random segments and then trained a model on 4 of the 5 segments and then tested it on 5th." Such data is serially correlated (it's not good old iid) so already it looks like you have poisoned the test set with information from the training set.The hard part is not "feature engineering" or "ensemble methods", the hard part is controlling the entropy that you feed these things because they are voracious monsters and will absolutely eat all of it.

评论 #11890154 未加载

评论 #11890362 未加载

评论 #11889151 未加载

评论 #11889950 未加载

joegreenalmost 9 years ago

If anyone else is getting errors when loading the page, here's the google cached version <a href="http://webcache.googleusercontent.com/search?q=cache:-ciyXfSG2XoJ:robotwealth.com/machine-learning-financial-prediction-david-aronson/+&cd=1&hl=en&ct=clnk&gl=us" rel="nofollow">http://webcache.googleusercontent.com/search?q=cache:-ciyXfS...</a>

评论 #11889086 未加载

评论 #11887678 未加载

评论 #11887418 未加载

评论 #11889071 未加载

dpwebalmost 9 years ago

There are a few problems with turning your laptop into a money machine using data analysis.Remember the maxim, past performance is not a guarantee of future results. You can develop strategies based on past data that will beat the market, but, the nature of markets is to adapt to kill your edge. Markets adapt constantly and your edge stops working at an unknown point in time. It's unknowable when that WILL happen because past data can't show that.The other reason is transaction costs. In gambling called vig. Let's say I'm betting NFL games. NFL home teams win 51% of games. Even flipping a coin I've read come up heads 50.1% of the time. These are profitable systems. But you're paying the bookie 10% on each loss. You could find someone to bet you on coin tosses and bet heads each time. You have a positive expected return, although you need a huge number of flips to make money!In trading of course costs is commissions. Why do you think there was a rise in HFT? The strategies are consistently profitable. (Besides the flashing/manipulation tactics) It is ONLY profitable because of extremely low commission costs that are not available to the retail (or even semi-professional) trader.Systems that can pull $0.0001 out of every share traded overall on high volume can be (pretty easily) created, but you can't trade them profitably. In fact, you will find commissions (semi-pros who pay about $3 per 1000 shares) priced right at the point of an edge you could be expected to develop.

评论 #11888161 未加载

mcbrownalmost 9 years ago

Former professional investment manager here...The biggest problem with things like this, which almost nobody talks about in the context of investing, is publication bias.100 people try to develop a profitable trading algorithm. 1 comes up with one that looks great on back-tests at a 1% confidence (in other words, exactly what you'd expect from random chance alone over 100 trials).That person writes an article/pitch/business plan based on their algorithm. You never see results from the 99 who failed.Going forward, the successful algorithm is no more likely to work than the failed 99, but from the perspective of the general public it sure looks like a winner!

评论 #11888404 未加载

评论 #11888394 未加载

评论 #11888558 未加载

hendzenalmost 9 years ago

If you can actually reliably generate alpha from a model like this there is no point of running the strategy yourself. There are any number of hedge funds that will sign you on, let you keep all of the IP you develop, and give you 10-12% of any returns you generate. That sounds small, but it's mitigated by the fact that you will have access to potentially billions of dollars in capital to trade if your strategy has the capacity for it. So you get 10% of a much bigger pie, with way less downside risk. Plus you get access to all their internal trading systems, execution services, data feeds, etc, which are usually orders of magnitude better than what an individual has access to.

评论 #11889333 未加载

ChuckMcMalmost 9 years ago

I think financial prediction via machine learning will be a useful cruicible for defining AI from non-AI. So far, so many companies that have applied machine learning to prediction have ended up on the wrong side of the order book at the wrong time. I don't know if this is because other algorithms figure out what they are doing and rapidly develop a counter algorithm to fleece them, or if its just savvy traders intuition about what the algorithm is keying on and manipulating it. Sort of like good RTS game players that figure out how the opponent AI is playing and start playing against its programming rather than some strategy from first principles.

xivzgrevalmost 9 years ago

Anyone know where he got all the raw data to feed his algo? Clearly he used a lot of data and the two main sources of free info i know of are google finance and yahoo finance. At least with google finance i run into issues with their api if you execute too many calls simultaneously, a bunch end up not returning any data

评论 #11888151 未加载

评论 #11889734 未加载

评论 #11888671 未加载

评论 #11888320 未加载

评论 #11888374 未加载

lordnachoalmost 9 years ago

Interesting article. I do something related, and here's my take:Data mining is useful because it gives you things that are predictive that you might not have considered at first, but make sense after. This is mainly due to combinatorial explosion in the potential number of formulas.You generally have a vague idea of what might be predictive, eg cheapness vs earnings and cash flow, but there's a huge number of ways that might show up in the data, and there's a huge number of ways it might hide in the data.So for instance an old school analyst might do a ranking of price/earnings as well as cash flow, or whatever bespoke formula desired.A data mining approach could take all the fundamentals and generate formulas mixing the variables, yielding a number that seem to be effective. Out of those, you'd look at them and decide that they capture some thesis (low P/E, upward trend in earnings). Then you'd look at whether the formula is sensitive to small tweaks. For instance, if you regressed the last 6 earnings and it had phenomenal performance, but with 5 or 7 it wasn't, you probably conclude it's some sort of random result.There's funds that take the mass approach to an extreme. They have huge databases, with a genetic algorithm that generates expression trees, and a battery of stats (incl backtests) to decide what works. They end up with many thousands of strategies that are a great deal more effective than your standard one-trick pony fund.

评论 #11887902 未加载

dreamdu5talmost 9 years ago

There's a hedge fund built by anonymous data scientists - <a href="https://numer.ai" rel="nofollow">https://numer.ai</a>You can use ML to make money on encrypted stock data for free. Think Kaggle but the winning models are used to trade.

评论 #11889745 未加载

aj7almost 9 years ago

Really successful traders spend their obtaining insider information, not massaging public data. It stands to reason that an ensemble of technical trading methods would regress towards the mean.

评论 #11888094 未加载

评论 #11887875 未加载

评论 #11888492 未加载

评论 #11887869 未加载

sovandealmost 9 years ago

I'll invoke the Black Swan (<a href="https://en.wikipedia.org/wiki/The_Black_Swan_%28Taleb_book%29" rel="nofollow">https://en.wikipedia.org/wiki/The_Black_Swan_%28Taleb_book%2...</a>) since it hasn't been done yet in this thread.

aj7almost 9 years ago

...spend their time and resources...

robotwealthalmost 9 years ago

HelloI'm Kris, the guy who wrote the article that started this thread. Thanks to all who have read my article and taken the time to comment. In the context of my motivation for starting my blog, it means a lot. I'm an engineer who became interested in quantitative finance and machine learning a few years ago. I learned how to code and apply my maths and stats knowledge to finance independently - no formal training whatsoever. This meant that for a long time I was conducting research and developing trading systems in a vacuum; I had no one to bounce ideas off or learn from. So I started writing about what I was doing in the hopes of getting some feedback. So thank you all for providing some. The insights were immensely valuable and I learned a lot.I thought it would be useful to respond to some of the comments.mathgenius brought up the extremely valid point that regular k-fold cross validation in a time series context doesn't make sense since the data is autocorrelated, not iid. I no longer use this approach for time series data, instead favoring Rob Hyndman's time series cross validation approach, also known as forward chaining. I believe this approach is the best representation of a real trading environment. The issue becomes deciding how large the rolling window of training data should be - older data may be obsolete, but excluding too much history can lead to not enough training instances.dpweb raises a good point too, namely that just because your model performed well on past data, even if that data was out of sample, there is no guarantee that the future will be sufficiently like the past, meaning that your model may well become useless at some point in time (possibly very quickly). This is a valid point, but no reason to abandon the markets. It does however require that any algorithm's live performance be objectively monitored such that the level of deviation from expected performance can be statistically quantified. Once a pre-determined confidence level in the model's obsolescence is reached, it should be removed from the portfolio.mcbrown's comment about publication bias is a good one too. Even worse, I've personally developed hundreds of trading systems that I haven't published. Other bloggers and publishers have most likely also done the same. This form of selection bias is very likely rampant, and is especially applicable to models 'discovered' using machine learning techniques that may not be rooted in traditional economic or financial principles. The moral: absent some form of robust accounting for selection bias, view all of these types of systems with a healthy dose of skepticism, and the published performance as a theoretical upper limit to what could be achieved in practice.hendzen's point about partnering with a fund or proprietary trading company rather than running your reliable, alpha generating strategy yourself is also a valid one. I have happily found this out for myself recently.Also, lordnacho is spot on regarding his take on the utility of data mining in finance.Thanks again for all the comments!

nxzeroalmost 9 years ago

Never understood why anyone would spend time creating any trading method given even if it did work (possible, but unlikely) the SEC would audit you and then leak how you were making the outperforming returns.Welcome any thoughts, in part because legally beating the market is possible, just don't get the SEC & OPSEC aspect.

评论 #11887794 未加载

评论 #11887858 未加载

14 comments

mathgeniusalmost 9 years ago

评论 #11890154 未加载

评论 #11890362 未加载

评论 #11889151 未加载

评论 #11889950 未加载

joegreenalmost 9 years ago

评论 #11889086 未加载

评论 #11887678 未加载

评论 #11887418 未加载

评论 #11889071 未加载

dpwebalmost 9 years ago

评论 #11888161 未加载

mcbrownalmost 9 years ago

评论 #11888404 未加载

评论 #11888394 未加载

评论 #11888558 未加载

hendzenalmost 9 years ago

评论 #11889333 未加载

ChuckMcMalmost 9 years ago

xivzgrevalmost 9 years ago

评论 #11888151 未加载

评论 #11889734 未加载

评论 #11888671 未加载

评论 #11888320 未加载

评论 #11888374 未加载

lordnachoalmost 9 years ago

评论 #11887902 未加载

dreamdu5talmost 9 years ago

评论 #11889745 未加载

aj7almost 9 years ago

Really successful traders spend their obtaining insider information, not massaging public data. It stands to reason that an ensemble of technical trading methods would regress towards the mean.