The anatomy of an ML-powered stock picking engine

298 pointsby muggermuchover 2 years ago

26 comments

darawkover 2 years ago

This was a very enjoyable read. I built a nearly (architecturally) identical system a few years back that also had to be scrapped for different reasons. This brought back a lot of memories. The sanity checks, the index reconstitution issues, dealing with the insanity of security identification and tracking through time.The fun cases are the ones where it's not even clear what the right answer truly is, e.g. company A spins out company B, and then 5 years later they re-merge. Who's time series and associated data is "the" canonical one? The data vendors often try to give their answers to this question, but maybe their answers don't make sense for your analysis.Then there's the fact that a lot of vendors don't really do point in time correctly. They like to go back and helpfully revise data points for you that they or the company initially misreported. This is all well and good except that if you were trading for real, you wouldn't have known the correct information at the time, and so any backtest based on the updated information will be invalid. Vendors are a bit better now about providing true point in time data sets, or at the very least accurately describing when they are/aren't doing this. But we had a few cases where they said they were, but they definitely weren't.

评论 #33003996 未加载

评论 #33034774 未加载

chollida1over 2 years ago

Someone asked about how difficult it is to get outside investment....It's usually very difficult and it takes a lot of money to run a proper fund.Let's say you raise $50M. You can maybe charge 1 and 20,meaning you get 1% of assets each year for running the fund and 20% of profits.1% of $50M( and keep in mind this is a large raise for someone without a track record on the sell side or inside another fund) give you $500,000 a year to pay:- salaries( lets say you pay yourself $100,000 all in plus the same for a single analyst- a Bloomberg terminal $30,000 including data feeds- market data feeds you need $25,000/year for basic market data and fundamental data that you are allowed to warehouse(you can't store data you get from the Bloomberg terminal).- rent $50,000/year for office space- outside lawyer fees and outside accounting fees $100,000/year- similar fees for someone to run your back office, roughly $100,000/year.And on the other side of expenses you have the money making side of things. Which as the OP pointed isn't great. If you return 10% on the 50M you get to keep 20% of that so a 10% return gives $5M in profits and you keep $1M.That allows you to bonus out yourself and analysts on good years. If you lose money one year then you get no bonus and have to bonus out the employees out of the retained earnings you kept from previous bonuses.it usually gets worse as most funds have what's called a high water mark. This means you don't collect the performance fee until your fund gets back to the high water mark. So if you are down 10% one year you need to make that back before you start to make any performance fee, which is why most funds shut down if they go down more than 20%.As to raising money.....Anyone can show a model that makes money. that doesn't mean its easy to create a model, its just that there are alot of people capable of building such a model.Its the risk management that people with money are really looking for and sadly that's just really hard to show out of a model as part of the risk management is things like positions sizing and showing your model doesn't pile into one asset class or trade correlated products.it bodes well for the OP that they talk about market regimes as, IMHO, this is one of the biggest risk management tools that aspiring traders ignore.And this risk management is why people ask for a track record of more than a year.

评论 #33001086 未加载

评论 #33001751 未加载

评论 #33003122 未加载

评论 #33000207 未加载

评论 #33005558 未加载

conformistover 2 years ago

What's the market beta? What's the average turnover/holding period? How are transaction costs modelled? What features explain most of the variance? How are they related to known factors? What's the beta hedged performance?These are all things I'd want to know before deploying something like this. (Perhaps some mentioned in the post, might have missed them.)To first order, I'd forget about fat tails and similar popular concerns. They matter, but not as much as structurally understanding what this model is up to. Perhaps one feature is explicitly selling tails? That might answer it already.

hendzenover 2 years ago

I know a bit about this industry and I have worked on some profitable systems. Honestly not a bad effort for someone working on their own with low-cost data. Don’t let the haters get you down. I would recommend you to pick up a more recent textbook on portfolio construction like Isichenko’s recent book.

评论 #33003717 未加载

alpineidyll3over 2 years ago

My heart goes out to this author, but you can tell even by his first table that he doesn't quite understand the mathematics of financial markets, the purpose of a hedge fund, how they grow etc.1) It's plain by quickly looking at the allocation of capital in investment firms, that AUM is not made by performance; it's marketing. At best people invest when they believe a person is connected to inside information. Saying you have an ML advisor is really just a pre-req to these people.2) Is that allocation stupid? No, it's not, because actually the powers of mathematics and by extension ML are intrinsically limited for investment returns because they are fat-tailed </Taleb>. For example this author quotes a realistic sharpe (0.8), but didn't calculate the standard deviation in his sharpe, which I would bet a large sum was _at least_ 0.8. Ie: he doesn't really know what his sharpe is. This is because equity assets behave like a student-t distributions with a degree-of-freedom parameter ~2 or less </Mandlebrot, /Bergomi, /Gatheral etc.>. Ie: higher moments such as uncertainty in sharpe, literally do not exist or converge and are unknowable. The only exception is if your strategy explicitly cuts off tails.Once you understand 2) you begin to understand that there's no such thing as a real quant fund (ie a fund which truly makes money predictably using models) which doesn't trade a liquidity limited book that has quite advanced hedging. Wealthy people are aware of this, which is why the author can't market this product.If you're doing something silly like holding equities without tail risk control, you literally cannot be quantitatively investing. You are just slowly rediscovering what Kelly, Bergomi, Mandlebrot, Bernay's etc. realized with a little deep thought over pen and paper (while clumsily writing boilerplate software.) That markets are entropy machines rougher than a normal distribution, and any gains come directly from information. (see: Kelly: "a novel interpretation of the information rate".)For a high latency (ms) market data feed, the returns on information are very very small. Markets are efficient.

评论 #33002892 未加载

评论 #33004901 未加载

评论 #33030958 未加载

igorkrawover 2 years ago

Nice writeup, thank you for sharing so openly!The three things I always want to know from stock picking ML people:1. Did you put your own money in it ?2. How'd it go?3. How well does your engine do vs a fixed stock allocation based on trend-statistics computed on the whole time window (i.e., compared to a fixed optimal portfolio computed with mean/std values you don't have access to, but which isn't allowed to change its choice. what's the regret if you are familiar with online learning)

评论 #32998770 未加载

Joel_Mckayover 2 years ago

Every gambler thinks they have a system, but often fails to recognize a game is unfair long before they arrived. lol =)

评论 #33002110 未加载

idohover 2 years ago

If you have a tool that can generate great returns, then why fall back to a newsletter?

评论 #32998818 未加载

asavinovover 2 years ago

> I have always kept in mind is that feature engineering is almost always the key difference between success and failureI also developed an ML-powered service heavily relying on feature engineering<a href="https://github.com/asavinov/intelligent-trading-bot" rel="nofollow">https://github.com/asavinov/intelligent-trading-bot</a> Intelligent Trading BotIts difference from Didact is that this intelligent trading bot is focused on trade signal generation with higher frequency of evaluation. It is more suitable for cryptocurrencies but also works for traditional stocks with daily frequencies so it could be adapted for stock picking. What I find interesting in your work is the general design of such kind of ML systems relying on feature engineering.

muggermuchover 2 years ago

Hi, fellow HN'ers! Author here, please let me know if you have any questions or thoughts!

评论 #33000712 未加载

评论 #32999662 未加载

评论 #33003698 未加载

评论 #33003426 未加载

artirdxover 2 years ago

This was not only a very informative read but felt like an amazing achievement if everything described here was developed by one person (the author - @muggermuch).The breadth of knowledge demonstrated by author from technology (bringing performance down to 14 minutes) to ML to deep understanding of financial markets is super-impressive.Granted the author has an educational background in computer science and has been a trader which probably explains many of his abilities but to my small brain it feels next-level achievement.Maybe I live in average circle of finance but I have never met nor heard of a person who could single-handedly conjuncture and implement such a system. To my knowledge, a typical hedge fund has several highly-paid people in different teams to build and maintain such a system.I never thought one-person could do it. I genuinely wonder how he managed to wrap his head across this much knowledge. He seem to fall in 10x category. Kudos!

jesuslopover 2 years ago

Nice report. How did you did risk management? Have you been leveraged? Have you paid for data? Kudos for a view from the trenches.

评论 #32999725 未加载

drdrekover 2 years ago

If you have a guaranteed compounding money machine that out perform the market by 20% just let it run, sooner or later you will be able to buy out those who did not invest in you. If its just a useful recommendation engine than there are indeed a lot of questions relating to personal finance or investment strategies that have nothing to do with machine learning that needs to be addressed for PMF. You don't need better models you need to understand the needs of your customers.

antogniniover 2 years ago

Have you considered submitting your predictions to the Numerai Signals? It's market neutral so as long as your models can generate some alpha you can still get good returns.

评论 #32998953 未加载

mywaifuismetaover 2 years ago

I'm curious what happens if you look at your returns and other metrics at different time scales, i.e. monthly and weekly, in addition to yearly. You can't make any argument based on a sample size of 2.As someone who used to work in the industry, I am 99.99% confident that you cannot have any alpha with a system like this, you are basically flipping coins, as some other commenters have pointed out.

gbasinover 2 years ago

If your predictions are good, I'd be happy to get you $100 million in assets to manage. It's very unlikely that your predictions are good...

评论 #33004281 未加载

评论 #33001976 未加载

moeymo1over 2 years ago

"steadily beating the S&P 500 for over a year on a weekly basis"Can be achieved by chance alone.If not chance, I can give you a strategy that would be highly likely to achieve such a result: it would take a lot of risk though!I love it when people post this stuff to HN. Naive people try it, loose a bundle to market makers, then go back to their day job.

评论 #33003914 未加载

wlamondover 2 years ago

Thanks for the article! What tool do you use to create the figures? I like the sketch style.For anyone else, what tools do you recommend for generating pretty for system architectures, workflows, etc.?

sanpover 2 years ago

OP, what are you using to draw the diagrams? They look nice and are very readable.

评论 #33000462 未加载

jshaqawover 2 years ago

"Predicting" markets isn't the challenge. Implementing real world strategies with associated frictions is. Show me a cash p&l or it's just a student project.

Gatskyover 2 years ago

Do these frequent trading strategies ever account for taxation? Where I live, the returns compared to a buy and hold passive strategy would be cut by 50%.

mbarras_ingover 2 years ago

Brilliantly written. As someone considering a move into the Quant field it is very informative.

评论 #32999249 未加载

ajosepsover 2 years ago

this is very cool! where did you get your data from and how's the transition to airflow?

评论 #32999264 未加载

ktiwari31over 2 years ago

Great post! Very informative! Thoroughly enjoyed it.

unpwnover 2 years ago

Lmao this engine is down 6.9% for the year, when literally it's as simple as just buying some puts.

评论 #32999271 未加载

评论 #33000777 未加载

prabdudeover 2 years ago

Excellent article

评论 #32998675 未加载