Stock market plays an important role in finance. Investors all expect to obtain higher returns in stock trading, but stock trading is affected by many factors, such as information, volatility, leverage, etc. Meanwhile, stock market data is complex, noisy, nonlinear, and non-stationary (Hadi2021, )
. Traditional quantitative trading methods have faced more challenges while dealing with such difficulties. With the development of machine learning methods in recent years, the combination of machine learning and quantitative trading has shown great potential for returns. Due to the ability of machine learning algorithms that capture richer indicators and information, excess returns can be obtained.
Reinforcement learning (RL), as one of the popular machine learning methods, is widely used in stock trading decision making. RL employs an agent to interact with the environment, so as to maximize its cumulative return, which is consistent with the scene of stock trading. Existing studies acquired appealing achievements in stock trading applications compared with traditional machine learning methods (Salvatore2021, ). However, there are several problems listed need to be improved according to the newest researches.
Firstly, Trading with a single stock history or indicator is a common work in existing works (Luo2019, )
, such as closing price, moving average indicator and so on, which can not provide more plentiful information for trading algorithms needed. Indicator selection is crucial for stock trading algorithms, which determine the ultimate algorithms’ performance, especially in deep learning (DL) and RL. So how to make use of the effective trading information is the bottleneck of the algorithm.
Secondly, in the existing stock trading papers with RL, their output actions are discretized into trading signals, usually including Buy & Hold & Sell (Jagdish2021, ; Hirchoua2021, ). Such signals can only decide the direction of trading, not the number of trading shares. Therefore, these algorithms are greatly limited in practical trading.
To tackle these challenges, we propose a multi-frequency continuous-share quantitative trading algorithm with GARCH (MCTG) based on deep reinforcement learning. Firstly, the state space of our MCTG model contains the information of multi-frequency data processed by parallel network layers. The multi-frequency data containing 3 different periods (5 min, 1 day and 1 week) can provide more abundant information for DL. And the parallel network layer is designed to capture more valid information and guide more valuable trading decisions. Secondly, we use volatility as a measure of risk and consider the Volatility Prediction Model GARCH which is widely used in econometrics. The volatility data predicted by GARCH is used to supplement the daily frequency input data, which is helpful for the agent to recognize risk, reduce transaction taxes and obtain higher returns. Finally, as for our action space of RL, we build a continuous trading decision algorithm, which can continuously buy and sell stock shares in the range number of -1 to +1, with negative numbers indicating selling stock and positive indicating to buy. And the flexible trading according to the proportion of output actions is conducive to the ability of multi-frequency data combined with GARCH model to measure risk, reduce transaction taxes and obtain higher returns. Our experimental results show that the proposed model can significantly outperform the bench model on five stocks.
2. Prolem Statement
We formulate the stock trading process into a Markov Decision Process (MDP) in order to match with DRL algorithms. A five-tuple array include ¡, ¿, which are defined as state space , transition function , action space , reward function and discount factor
. The state transition probability is objectively given and is not affected by actions of the agent. Considering the dynamic and uncertain characteristics of the stock market, We set up the stock trading model as an MDP as follows:
State : State includes a series of features of the stock’s current state. We use the stock’s information include six different frequencies of opening, closing, high, low, volume, amount and volatility, to build the parallel network layer to form our state.
Action : Our agent can take an action set from -1 to 1, which means sell & buy (0 indicate hold) with a continuous trading decision. For example, the output number of 0.3 means that 30% of the current cash is used to buy stocks, and - 0.5 means that 50% of the existing stocks are sold.
Reward : An agent gets reward according to its trading strategy.
In this section, We specifically introduce the design of our parallel network structure, RL algorithm, and the application of the volatility prediction model (GARCH).
3.1. Multi-frenquency parallel architecture
We make multi-frequency data of stock history trading data (5 min, 1 day with GARCH and 1 week) as the input data of our model. Each of them will be processed by proprietary DNN layers, dropout layer, and all of the layers are called parallel network (Guo2019, ). After processed by DL algorithms, there will be three output matrixes with same shapes which include 16 elements, and then they were multiplied with three weight matrices with the same shape are updated gradually with the training of deep reinforcement learning (). After that, all these three matrixes () will be concatenated into a matrix which shape includes 48 elements. So the equation of state computes as follow:
where represents the processed data at different frequencies at times , and represents their weights. More details are shown in Figure 1.
3.2. Proximal Policy Optimization Algorithms (PPO)
The reinforcement learning algorithm can be divided into two types, the value-based method and the policy gradient (PG) method (Sutton1998, ). The former is based on the Q value of a state-action pair , which is not suitable for our problem. The latter optimizes the policy directly by using gradient ascending to optimize the objective function, this algorithm is more suitable for continuous trading decisions. The formula of PG is shown as below:
where denotes a policy with parameters , denotes the expected finite-horizon undiscounted return of the policy, is a trajectory and is the advantage function for the current policy. In PPO algorithm(John2017, ), it defines a probability ratio:
and try to optimize the following objective function:
is a very small hyperparameter which roughly says how far away the new policy is allowed to go from the old.
3.3. Generalized Autoregressive Conditional Heteroskedasticity (GARCH)
The GARCH model describes the variance of the current error terms as a function of the previous periodic error terms(HU2020, ). The specification for the GARCH is defined as:
According to existing studies, the rolling prediction of GARCH has a time correlation with return rate of stocks(Helmut2017, ). In this paper, we choose the GARCH (1,1) model to make a rolling prediction of volatility, and then add the volatility into our day level data.
4.1. Experiments setting
Dataset. We collect five stocks from different sectors in Chinese stock market (SZ.002230, SZ.000333, SH.603288, SH.600030, SH.600276, SH and SZ represent the Shanghai and Shenzhen stock exchange), and the range of data is 2011 to 2020 with different frequency (5 min, 1 day with GARCH and 1 week). We divide data into training set and test set. The data from 2011 to 2018 is used as the training set, and the rest is test set.
State space. Our state space contains history multi-frequency trading information and GARCH volatility items. The multi-frequency information includes 5 min, 1 day with GARCH and 1 week for short, medium and long term information which can provide more abundant information than single daily frequency data. As the trading time is 4 hours a day, including 48 sets of 5 min short-term information, the medium-term information consists of 30 daily frequency information and rolling GARCH fluctuation items, and the long-term information consists of 30 weekly frequency information. The multi-frequency information is represented by the three blue parts on the left in Figure 1.
Action space. We employ our agent firstly to output a continuous value from -1 to 1, which specifies both the trading directions (buying and selling determined by signs plus and minus) and the number of stock shares. The flexible trading signals are conducive to the ability of the output of state space to measure risk, reduce transaction taxes and obtain higher returns. The final output is an action that determines buy or sell on the current state computed by (6). We consider the design of minimum transaction with 100 shares and the transaction fees charged in Chinese market. To ensure the validity of the trade, we use the next day’s opening prices of the stock as the buying and selling point and the formula for final action as follows:
where represents an integral function. Cash, tax and hold represent the amount of cash, the transaction tax and the stock shares holding.
Reward function. In order to get more extra profit rate comparing with the B&H baseline, we consider making our reward function as follows:
the first term of equation (7) represents the return rate of assets held, and the second term represents the return rate of stock price. The design of reward sufficiently enables the agent to learn to control its position through continuous decisions, which better fits the trading logic of real investors.
. The proposed model adopts Stable-Baselines and TensorFlow framework. In the PPO algorithm, we select the learning rate, batch size, discount rate and minibatch are 0.00025, 1024, 0.99, 4. In the parallel network layer, three DNN models dealing with different frequencies are all adopted four-layer neural networks, with hidden layers of 32 and 16 units. We also add a dropout layer with rate of 0.25.
We compare 4 different methods including DNN (Dense neural network), DNN-GARCH (Dense neural network with garch), MCT (Multi-frequency continuous-share trading model) and MCTG (Multi-frequency continuous-share trading model with GARCH). Our method is compared with B&H as a baseline in each model.
Training performance analysis. The training process is shown in Figure 2. The final result of the episode reward has been magnified 100 times for a better visual presentation. MCT and MCTG were significantly better than DNN and DNN-GARCH in episode rewards on the four groups of experiments. In many cases MCTG outperforms MCT, which indicates that the GARCH model provides more useful information during the training process.
Testing performance analysis. After trading 2 years from 2019 to 2020, we get the performance of the algorithm which is shown in Table 1. It includes the profit rate (PR) and tax rate (TR) performance of five stocks under different models. Profit rate (PR) represents the annual return rate. Tax rate (TR) represents the annual tax rate, which is used to describe the transaction frequency. There is no TR since the B&H strategy does not trade.
About the testing performance of five stocks, the DNN model achieves a better PR than B&H on four stocks and average return. After the addition of GARCH item in DNN model, the average return rate is up to 63.97%, and the transaction frequency is reduced by nearly 1% at the same time, which shows that GARCH provides more abundant fluctuation information for agents, thus reducing the transaction TR. PR of the MCT model using multi-frequency information is significantly higher than the DNN models because multi-frequency data and parallel network layer contribute to providing and analysing more effective information for the agent, but TR is higher which is because of the large trading stock shares and more actions to get more trading revenue. The return rate of the MCTG model with GARCH item has reached a higher level nearly 9% than MCT and the reason is that GARCH model provides more favorable information to help the algorithm to measure risk, reduce transaction taxes and obtain higher returns. Meanwhile, TR has been reduced by nearly 1%.
In this work, we propose a quantitive trading model named the multi-frequency continuous-share trading model with GARCH (MCTG) which contains three parallel network layers and RL. The three parallel network layers process input data with different frequencies and add GARCH item to the daily data to construct our model. Our action space of RL to make continious trading decisions of stock shares is more suitable for practical application. Experiments with two years of five stocks in the Chinese market show that MCTG has the ability to effectively process and analyze multi-frequency information, measure risks, reduce transaction taxes, and obtain more benefits. MCTG also outperforms most of other models including the B&H baseline in terms of PR. We also compared the performance of TR under the same structure models where GARCH contributes to reducing PR. MCTG significantly increases PR and reduces TR. The above results show that our model can better fit the real continuous transactions than other algorithms.
Acknowledgements.This work was supported by the National Key Research and Development Program of MOST of China under Grant 2018AAA0101003.
-  Hadi Rezaei, Hamidreza Faaljou, and Gholamreza Mansourfar. Stock price prediction using deep learning and frequency decomposition. Expert Syst. Appl., 169:114332, 2021.
-  Salvatore Carta, Andrea Corriga, Anselmo Ferreira, Alessandro Sebastian Podda, and Diego Reforgiato Recupero. A multi-layer and multi-ensemble stock trader using deep learning and deep reinforcement learning. Appl. Intell., 51(2):889–905, 2021.
-  Suyuan Luo, Xudong Lin, and Zunxin Zheng. A novel cnn-ddpg based ai-trader: Performance and roles in business operations. Transportation Research Part E: Logistics and Transportation Review, 131:68–79, 2019.
-  Jagdish Chakole, Mugdha S. Kolhe, Grishma D. Mahapurush, Anushka Yadav, and Manish P. Kurhekar. A q-learning agent for automated trading in equity stock markets. Expert Syst. Appl., 163:113761, 2021.
-  Badr Hirchoua, Brahim Ouhbi, and Bouchra Frikh. Deep reinforcement learning based trading agents: Risk curiosity driven learning for financial rules-based policy. Expert Systems with Applications, 170:114553, 2021.
Shengnan Guo, Youfang Lin, Ning Feng, Chao Song, and Huaiyu Wan.
Attention based spatial-temporal graph convolutional networks for
traffic flow forecasting.
The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI2019, pages 922–929. AAAI Press, 2019.
-  Richard S. Sutton and Andrew G. Barto. Reinforcement learning - an introduction. Adaptive computation and machine learning. MIT Press, 1998.
-  John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017.
-  Yan Hu, Jian Ni, and Liu Wen. A hybrid deep learning approach by integrating lstm-ann networks with garch model for copper price volatility prediction. Physica A: Statistical Mechanics and its Applications, 557:124907, 2020.
-  Helmut Herwartz. Stock return prediction under garch — an empirical assessment. International Journal of Forecasting, 33(3):569–580, 2017.