> In my mind it is equivalent to historical data labeled and resampled with redundancy after shuffling<p>It is even more constrained than that: there can be no shuffling along the time dimension, as that would destroy the relationship between time and the price trajectory (e.g. momentum). It is possible they could have sampled subsets of the full universe of 30 stocks to generate different training scenarios, but the blog post doesn't talk about anything related to that.<p>> agent should be able to play with simulated environment to explore<p>The work makes the assumption that the environment is not impacted by the agent's actions, see section 3.3:<p>> Market liquidity: The orders can be rapidly executed at the close price. We assume that stock market will not be affected by our reinforcement trading agent.<p>It's interesting to try to roughly estimate how many non-overlapping windows of market data were used in training the ensemble strategy. What I really want is to estimate how many independent samples of input data there are to train on, but there are probably no truly independent samples, since we're talking about historical trajectories of stock prices over time. For the window size we can try to figure out how much time series data the ensemble model needs to consume to output a single prediction.<p>The ensemble strategy is described as:<p>> Step 1. We use a growing window of 𝑛 months to retrain our three agents concurrently. In this paper, we retrain our three agents at every three months.<p>> Step 2. We validate all three agents by using a 3-month validation rolling window followed by training to pick the best performing agent which has the highest Sharpe ratio. We also adjust risk-aversion by using turbulence index in our validation stage.<p>> Step 3. After validation, we only use the best model with the highest Sharpe ratio to predict and trade for the next quarter.<p>Steps 1, 2 and 3 depend on a number of numerical and structural parameters, such as the size of the windows in step 1 and 2, the choice of which metric to pick best agent, the "turbulence index" adjustment, the use of argmax instead of some other approach in step 3. It is possible that the researchers wrote down and pre-commited to exactly these parameters before looking at any data, and never changed them, but it is more likely that these parameters were chosen out of a huge space of alternatives based on what was observed while running experiments, before finally measuring performance on the hold-out data set.<p>When thinking about automated ensemble strategies, the above 3 steps all need to be executed before the ensemble model can output a single prediction. I don't quite understand the explanation of step 2 but it suggests that the ensemble model needs to consume at least 3 months + 3 months = 6 months of trailing data before it can output a single prediction. In reality it would be worse, many of the input features defined for each stock that are fed as inputs appear to be technical indicators that are some fancy form of a moving average, so to define them they also need to consume some trailing window of price data -- these details aren't described in the blog. If we assume that each of these features needs at most 1 month of trailing price data then that means we need 1 month + 3 months + 3 months = 7 months of trailing price data before the ensemble model can output a single prediction.<p>> Data from 01/01/2009 to 12/31/2014 is used for training, and the data from 10/01/2015 to 12/31/2015 is used for validation and tuning of parameters. [...] we test our agent’s performance on trading data, which is the unseen out-of-sample data from 01/01/2016 to 05/08/2020<p>So data from 01/01/2009 to 12/31/2014 is used for training, and data from 10/01/2015 to 12/31/2015 was also used to tune parameters, i.e. also used for training. This gives 15 years of data for training, which is enough to generate about 15 years / 6 months = 30 non-overlapping windows of data that could be used to generate outputs from the ensemble model. Then there's enough out-of-sample data for about 4.5 years / 6 months = 27 non-overlapping windows of data to evaluate the model performance.<p>It seems like this work involves fitting and tuning a very high parameter model using a dataset that only offers a single trajectory (containing all the stocks) of 30 non-overlapping periods of input data to use when training, and 27 non-overlapping periods of data to evaluate.