TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: How credible is Reinforcement learning in finance(stocks)?

8 点作者 critiq超过 4 年前
Recently came across blog post on reinforcement learning on stock data to trade. However in reinforcement setting agent should be able to play with simulated environment to explore and learn. However there is no way to simulate the price all it does is takes historical fragment of price and replays it.<p>In my mind it is equivalent to historical data labeled and resampled with redundancy after shuffling. Am I missing something here?<p>Blog that I was reading was: https:&#x2F;&#x2F;towardsdatascience.com&#x2F;deep-reinforcement-learning-for-automated-stock-trading-f1dad0126a02

3 条评论

shoo超过 4 年前
&gt; However there is no way to simulate the price<p>Compared to other real world problem domains which are far less computerised it seems relatively simple to get fresh data for a trading system: let the system execute actual trades and measure the actual response.<p>It might be difficult to do this without having a budget to burn in running experiments, both in terms of money that is going to be lost while the system makes poor trades, setting up controls to ensure that not too much money can be lost in any experiment, and subscribing to data feeds that give a richer view of how the market responds (e.g. order book info).<p>In contrast, consider trying to get fresh data in materials science etc where you may need to manufacture small batches of materials and then test them. It might cost $10k in materials and weeks of work with expensive machinery and skilled technicians to generate a dozen or so fresh data points.
alexmingoia超过 4 年前
The problem I see with ML on stock prices is that stock price is not a function of the various available numerical indicators. Markets are non-deterministic and chaotic systems. Contrast that with driving, where the decision to turn left, right, throttle, etc. is largely a function of the features (surroundings, current speed, etc.).<p>That said, people are using ML to predict future prices, although no hedge fund claiming this has published their returns.<p>With regards to reinforcement learning, why not just try it? Numer.ai is an ML competition that gives you free obfuscated stock data and rewards you if your model is successful.
评论 #24350610 未加载
shoo超过 4 年前
&gt; In my mind it is equivalent to historical data labeled and resampled with redundancy after shuffling<p>It is even more constrained than that: there can be no shuffling along the time dimension, as that would destroy the relationship between time and the price trajectory (e.g. momentum). It is possible they could have sampled subsets of the full universe of 30 stocks to generate different training scenarios, but the blog post doesn&#x27;t talk about anything related to that.<p>&gt; agent should be able to play with simulated environment to explore<p>The work makes the assumption that the environment is not impacted by the agent&#x27;s actions, see section 3.3:<p>&gt; Market liquidity: The orders can be rapidly executed at the close price. We assume that stock market will not be affected by our reinforcement trading agent.<p>It&#x27;s interesting to try to roughly estimate how many non-overlapping windows of market data were used in training the ensemble strategy. What I really want is to estimate how many independent samples of input data there are to train on, but there are probably no truly independent samples, since we&#x27;re talking about historical trajectories of stock prices over time. For the window size we can try to figure out how much time series data the ensemble model needs to consume to output a single prediction.<p>The ensemble strategy is described as:<p>&gt; Step 1. We use a growing window of 𝑛 months to retrain our three agents concurrently. In this paper, we retrain our three agents at every three months.<p>&gt; Step 2. We validate all three agents by using a 3-month validation rolling window followed by training to pick the best performing agent which has the highest Sharpe ratio. We also adjust risk-aversion by using turbulence index in our validation stage.<p>&gt; Step 3. After validation, we only use the best model with the highest Sharpe ratio to predict and trade for the next quarter.<p>Steps 1, 2 and 3 depend on a number of numerical and structural parameters, such as the size of the windows in step 1 and 2, the choice of which metric to pick best agent, the &quot;turbulence index&quot; adjustment, the use of argmax instead of some other approach in step 3. It is possible that the researchers wrote down and pre-commited to exactly these parameters before looking at any data, and never changed them, but it is more likely that these parameters were chosen out of a huge space of alternatives based on what was observed while running experiments, before finally measuring performance on the hold-out data set.<p>When thinking about automated ensemble strategies, the above 3 steps all need to be executed before the ensemble model can output a single prediction. I don&#x27;t quite understand the explanation of step 2 but it suggests that the ensemble model needs to consume at least 3 months + 3 months = 6 months of trailing data before it can output a single prediction. In reality it would be worse, many of the input features defined for each stock that are fed as inputs appear to be technical indicators that are some fancy form of a moving average, so to define them they also need to consume some trailing window of price data -- these details aren&#x27;t described in the blog. If we assume that each of these features needs at most 1 month of trailing price data then that means we need 1 month + 3 months + 3 months = 7 months of trailing price data before the ensemble model can output a single prediction.<p>&gt; Data from 01&#x2F;01&#x2F;2009 to 12&#x2F;31&#x2F;2014 is used for training, and the data from 10&#x2F;01&#x2F;2015 to 12&#x2F;31&#x2F;2015 is used for validation and tuning of parameters. [...] we test our agent’s performance on trading data, which is the unseen out-of-sample data from 01&#x2F;01&#x2F;2016 to 05&#x2F;08&#x2F;2020<p>So data from 01&#x2F;01&#x2F;2009 to 12&#x2F;31&#x2F;2014 is used for training, and data from 10&#x2F;01&#x2F;2015 to 12&#x2F;31&#x2F;2015 was also used to tune parameters, i.e. also used for training. This gives 15 years of data for training, which is enough to generate about 15 years &#x2F; 6 months = 30 non-overlapping windows of data that could be used to generate outputs from the ensemble model. Then there&#x27;s enough out-of-sample data for about 4.5 years &#x2F; 6 months = 27 non-overlapping windows of data to evaluate the model performance.<p>It seems like this work involves fitting and tuning a very high parameter model using a dataset that only offers a single trajectory (containing all the stocks) of 30 non-overlapping periods of input data to use when training, and 27 non-overlapping periods of data to evaluate.
评论 #24418427 未加载