Predicting the future status of stocks has always been of great interest by many investors for that a little improvement of prediction accuracy might yield a huge enormous gain. However, it is a very challenging task due to multiple uncertain political-economic factors in the real world. To predict the price movement of a target stock, previous works mainly focuse on discovering the principle only based on the stock’s own historical information, e.g, price, related financial news and events, as well as sentiments on social media. Few of them pay attention to the correlations between stocks and their effects on stock price. However, besides its own historical records, the price movement of a given stock can also be affected by other stocks [Brennan et al.2015]. The correlations between the movement of stocks can be reflected by but not limited to the following aspects [Hou and Kewei2007, Brennan et al.2015].
First, the relationship between stocks can be reflected by their industry characteristics [Hou and Kewei2007]. For example, the development or decline of an industry might result in corresponding changes of the stock price of this industry.
Second, stocks can be related by their common properties, e.g. stock concepts. For example, both the Tesla and BYD own the same concept-New Energy Vehicles. Essentially, concept is an investment consensus on stocks by many market participators [Lee et al.2018]. Some portfolio managers buy or sell a group of stocks which share the same concepts, instead of a single stock. Such investing behaviors generate correlations among this group of stocks.
Third, the shareholder relationship between companies also affect their stocks [Chen et al.2018]. For example, the good operation performance of a subsidiary company might raise both its own stock price and that of its holding company.
However, how to build an effective model to predict stock movement through combining these relationships is a big challenge. In this paper, we propose a GCN-based deep learning framework to predict future status of each stock, since GCN has been verified to be effective to model the complex correlation of graph-based data. Specifically, our contributions can be summarized as follow.
We propose a deep learning-based framework, which utilizes RNN and GCN to model the temporal dependency of each related stock’ price movement and the influences from all involved stocks respectively.
We extract three kinds of relationship between stocks, industry relationship, concept relationship and shareholder relationship. For each relationship, we construct a graph represented by an adjacency matrix based on financial data in stock market. We employ GCN on each graph to model the influences among all involved stocks on the target stock. Note that our framework can be easily extended to incorporate more effective relationships.
We test our model on two real world stock market indexes on China Mainland. The experimental results show that our model outperforms baselines with relative 3% improvements on average.
2 Related Works
2.1 Stock Price Prediction
It’s difficult to make prediction on stock price for the underlying complicate and diverse factors. By now, researches have made substantial efforts to solve the stock prediction problem based on different information sources.
Based on the stock’s historical prices data, Zhang et al. zhang2017stock proposed a SFM network to discover relevant frequency patterns of stock price. Feng et al. feng2019enhancing employed adversarial training to improve the generalization of a neural network prediction model. However, only stock’s historical prices data cannot entirely convey the volatility of stock price movement. Other text information, such as financial news [Wu et al.2018, Nguyen and Shirai2015], texts from social medias [Li et al.2015, Jin et al.2017], web browsing data [Bordino et al.2014, Jin et al.2017] and so on, are used as complementary information to discover more principles of stock price movement by some researchers. For example, Ding et al. ding2015deep extracted events from news titles and proposed a novel framework based on CNN to model both short and long term influences of events on stock price. Liu et al. liu2018hierarchical presented a hierarchical complementary attention network for stack prediction by combining both news title and news content. Xu et al. xu2018stock presented a novel deep generative model to learn opinions from Twitter texts. In addition, Rekabsaz et al. rekabsaz2017volatility investigated the sentiment from annual disclosures of companies by embedding-based approach. Chen et al. chen2019investment explored the mutual fund portfolio data to extract stock intrinsic properties for enhancing prediction. Qin et al. qin2019you integrated CEO’s vocal features in a conference call into the predicting model.
Despite strenuous efforts have been made to understand the principles of a target stock, the common strategy of them to predict stock movement direction is based upon a single stock’s historical records and some textual information. Few attempts have been made to predict stock price movement by combining the correlated stocks. Most of them ignore the correlations among stocks. In this paper, we pay attention to model the influences from other stocks on the target stock.
2.2 Graph Convolutional Networks
Graph convolutional network (GCN) gains much attention recently for that it successfully generalizes the tradition convolution on graph-structured data [Kipf and Welling2017]. Many works have proved its effectiveness to capture the interaction between nodes on graph [Li et al.2018, Guo et al.2019]. In this paper, we represent each involved stocks as a node in the graph and construct their relationships based on some investment experiences, further we employ a GCN based framework to model the relationship influence from involved stocks on the target stock.
3 Problem Formulation
This paper aims to use the historical information of all the involved stocks to predict the target stock’s price movement in the next day.
Given a target stock, its price movement percent in day is defined as , with referring to its closing price at day . We denote the price movement of this target stock at day as . Following previous work [Feng et al.2019], we label price movement percent as positive with and movement percent as negative with . Therefore, the prediction in our paper is a binary classification task.
Suppose there are related stocks detailed in the following section, each stock has features(detailed in next section). We define as the prices information of all related stocks at day . The window size of historical information is denoted as . The relationships among all corresponding corporations are defined as . The prediction problem in this paper is formulated as follow:
Where f is our model.
4 Multi-View RNN-GCNs
In this section, we introduce the Multi-view RNN-GCNs architecture, which is composed of three components. The first component is Historical Information Learner, which consists of a recurrent neural network with two layers. The goal of this component is to learn the temporal dependency of historical prices for each stock. The second component is Multi-Relationship Learner, which consists of three graph convolutional networks and each of them is used to model the interaction between stocks based on one kind of relationship. The third component named as Predictor aims to generate the final prediction. The three components are implemented sequentially. The details of the architecture are shown in Figure 1.
4.1 Historical Information Learner
The Historical Information Learner aims to model the temporal dependency of past stock prices for each stock. RNN has become the state-of-the-art model for modeling the temporal dependencies of time series data. So in this paper, we employ RNN to capture the historical influence for each stock on the target stock.
The input of RNN is the historical information of all related stocks. We define the time window size as and assume that each stock at each time slice has features, which are normalized. The historical information of stock at time slice is denoted as . The historical records of stock during the past periods are represented as . The output of RNN model for stock is denoted as , which is a scalar value and represents the influence from historical records during time interval of stock on the target stock. Suppose that there are
involved stocks for the target stock prediction and we stack their outputs together to form a vector withvalues, denoted as represents the historical influence of all the involved stocks on the target stock and it is the input of Multi-Relationship Learner.
4.2 Multi-Relationship Learner
Multi-Relationship Learner aims to model the interaction between stocks based on various kinds of relationship. It contains two stages. The first stage is to encode three kinds of relationships among involved companies into three graphs. The second stage is to learn the relationship dependency of corporation networks by implementing graph convolutional operation over each graph.
4.2.1 Multiple Relationships
As mentioned above, the decision of a market participant to buy or sell an individual stock not only depends on its internal properties, such as historical records, but also depends on its complex correlations with other related stocks. For instance, a good news inspiring one industry probably affects all the related stocks in that industry in an asynchronous manner.
To effectively integrate stock correlations into our framework, we construct all the involved stocks in a graph and each node in the graph represents a corresponding company. An edge between two nodes refers to some kind of relationship between two corporations. Each kind of relationship is corresponding to one graph. And there are three graphs respectively for three kinds of relationship. It is supposed that corporations affect each other’s stock price through such relationship. The adjacency matrix is used to represent the graph structure. Each element in A stands for the relationship between company and company . The challenge here is to extract the appropriate relationship among corporations which is beneficial for prediction. We inherit the shareholder relationship from Chen’s work chen2018incorporating and define two novel relationships, the industry relationship and concept relationship.
Chen and Wei chen2018incorporating define a graph based on shareholding structure and the weight of each edge stands for the shareholding ratio between two corporations. Note that the shareholding ratio is in range of [0, 1], thus . Despite the financial fact that the volatility of a parent company’s stock price is likely to transmit to its subsidiary company, we find that the cross-shareholdings among public corporations are rare. This phenomenon leads to a very sparse matrix and thus weakens its role on prediction. To improve the prediction accuracy, we extract two other effective relationships between corporations. One of them extracts relationship from an industry view.
The life cycle of an industry restricts or decides the survival and development of its enterprises. Thus, it’s reasonable to suppose that the stock price movements of corporations are associated with their industry relationship. In addition, the more similar business scale two corporations share, the more similar their stock volatilities are. If company (the relative small one with business scale ) and company (the relative large one with business scale ) are not in the same industry, element in matrix is denoted as 0, else . Note that . In this paper, the business scale is measured by registered capital. Other financial indicators such as net income or market capitalization also work.
The stock price is related to its concepts to some extent. Essentially, concept is an investment consensus on stocks by many market participators in China. Stocks with the same concept gain similar investors’ attention. For instance, the day the latest SEC action was announced, share prices of two-thirds of the China concept stocks plunged. Therefore, we define another relationship between corporations based on concepts. We find that each company has more than one concept and some companies share more than one concept. For example, each company in CSI500 has 7 concepts on average. We denote the number of concepts shared by company and company as and the number of concepts owned by company and company is denoted as and respectively. We set and . Note that .
In summary, we extract three kind of relationships among listed corporations based on stock market investment facts and incorporate them into three graphs. Note that our framework can be easily extended to more kinds of effective relationships.
4.2.2 Graph Convolutional Networks
In this paper, following Kipf kipf2017semi, we design our GCN with two graph convolutional layers, along with one input layer and one output layer, as shown in Figure 2. The formula of our GCN is defined as follow.
Where is the adjacency matrix, is the features matrix, is the output of our GCN, is the number of nodes in the graph, refers to the number of features of each node, refers to the number of hidden features,
is the activation function,refers to the trainable parameters.
The input layer is composed of two matrixes, adjacency matrix and features matrix. The former one is the representation of the graph, containing the connectivity information of nodes. The latter one refers to all the features of all involved nodes. The graph convolutional layer can be decomposed into two simple layers, Aggregate Layer and Fully Connected Layer (FC layer). Aggregate Layer is the product of adjacency matrix and features matrix, which indicates the aggregation from the adjacent nodes on feature dimension. Subsequently, FC layer creates new and higher feature representation for each node.
In stock prediction task, we construct three graphs to represent three kind of relationships that can influence the stock prediction, thus designing three adjacency matrixes, which are , ,. Each matrix is corresponding to one GCN. The features matrix of each GCN is the output of Historical Information Learner, denoted as . Each GCN is in charge of modeling one kind of relationship influence from the related stocks on the target stock. The outputs of three GCNs are , , , each of which is a scalar value. We concatenate them together to generate one dimension vector denoted as .
The output of Multi-Relationship Learner is . Intuitively, different relationships among corporations should have different impact on the target stock price. A FC layer is stacked to assign different weights to three kinds of relationships and its output is
. Because the task in this paper is a binary classification task, we employ a sigmoid function as the activation function. The formula of Predictor is as follow.
To demonstrate the effectiveness of our model, two exclusive and best-known indexes of China stock market are chosen to build our datasets. They are CSI (China Securities Index) 300 index and CSI 500 index. CSI 300 is composed of three hundred large-cap listed corporations with good liquidity. CSI 500 consists of constituent stocks chosen from top 500 mid-cap and small-cap listed companies. The versions of CSI500 and CSI300 are defined every half a year and we fix them on 2015 January, following Li’s work li2019multi.
Our datasets are composed of two parts. The first part contains historical prices of all stocks. The second part contains a financial dataset for constructing relationships. Both datasets are obtained from a publicly available API-Tushare.
5.1.1 Historical Prices Dataset
We retrieve the historical prices of companies in two indexes from 1st June 2015 to 5th December 2019. All prices are adjusted for dividends and splits. The length of the historical prices is 1121 trading days. We find that several corporations in these indexes had been delisted during the collection period, thus lacking trading prices since its delisting. To guarantee the quality of the datasets, we only keep the stocks which exist until now. Finally, it remains 287 stocks in CSI 300 and 489 stocks in CSI 500. We also find that even during the same trading day, not all the stocks have trading data for the temporary suspension of some stocks. Thus, we align the historical trading days of all the stocks and fill up one stock’s missing prices with its most recent day’s trading prices. We extract 6 numeric features for each stock, including opening price, high price, low price, closing price, volume, and amount. Each feature is normalized with z-score function with standard deviation as 1 and mean as 0.
|Indexes||Training set||Validation set||Testing set||Total|
5.1.2 Relationship Dataset
This dataset is collected to build up the relationships among corporations. For industry relationship, we extract industry feature and registered capital feature for each stock. For shareholder relationship, we extract the top 10 shareholders list of each corporation and keep those listed shareholders. For concept relationship, we extract the concept list.
We split the historical prices dataset into three parts: the first 80% days for training, then 10% for validation and the last 10% days for testing. Details of the division of these two indexes are shown in Table 1.
5.2 Evaluation Metrics
The stock movement prediction task in this paper is a binary classification problem. We select five metrics to justify the effectiveness of the proposed approaches, which are Accuracy (ACC), Precision, Recall and Matthews Correlation Coefficient (MCC). ACC measures the ratio of correct predictions over all evaluated examples. Precision focuses on examples predicted as positive class and its correct prediction ratio. Recall is used to measure the fraction of positive examples that are correctly classified. The higher value MCC has, the more accurate the prediction is. In addition, MCC can avoid bias due to data skew. All metrics are calculated on all the constituent stocks in each CSI index. The formulas of Accuracy and MCC are as follow.
In the confusion matrix, TP is true positive, TN is true negative, FP is false positive, FN is false negative.
5.3 Experiment Analysis
In this paper, we argue that the prediction of price movement of a target stock is correlated with four factors, which are the historical prices of the target stock, the historical prices of other related stocks, the relationships among all involved stocks and the length of historical information. To test the effectiveness of these factors, we design three experiments and construct three datasets, named Target Dataset, Involved Dataset and Combined Dataset. Target Dataset contains only historical prices of the target stock. Involved Dataset contains historical prices of not only the target stock but also other related stocks. Combined Dataset contains the historical prices of all involved stocks and their relationships. Table LABEL:tab:results shows the final experiment results on the test dataset.
5.3.1 Effectiveness of Other Related Stocks
The first experiment aims to evaluate the effectiveness of considering features of other related stocks. Many prior researches use only information of the target stock for its price movement prediction and they ignore the influence from other correlated stocks. To prove that other related stock has impact on the target stock, we test Target Dataset and Involved Dataset on the following baselines for each index.
Feed Forward Neural network: FNN used in this paper has two hidden layers. Each layer contains 300 units.
Recurrent Neural Network: RNN is recognized as a classical method to capture temporal dependency in time series problem.
As shown in Table one, all the baselines achieve better performances on Involved Dataset than Target Dataset on ACC and MCC, which proves that it is worthwhile to consider other related stocks to improve prediction accuracy. Besides, RNN outperforms ANN by almost 1% in accuracy which justifies that temporal dependency matters in stock prediction.
5.3.2 Effectiveness of Relationships among Involved Stocks
In the first experiment, we put the features of involved stocks into models without considering the correlations among stocks. However, the investment facts tell that stock prices interact with each other through stock relationships. Therefore, the second experiment is designed to incorporate the relationships among involved stocks into the model and investigate their effectiveness. To evaluate the contribution of each relationship on the prediction, we decompose our model into three simple versions, RNN-GCN-Industry, RNN-GCN-Shareholder and RNN-GCN-Concept. Each of them is in charge of modelling influence from only one kind of relationship.
Compared with the best model RNN in the first experiment, all the three simplified models achieve a better performance by 1% on average for each index. It demonstrates that integrating the relationship factor into the model is helpful for stock price prediction.
In addition, among the three simplified models, the framework with shareholder relationship performs worst, while model with concept relationship achieves the best performance on both indexes. It indicates that some relationship is more effective than other relationship on prediction. What’s more, as far as the sparsity of matrix is concerned, shareholder matrix ranks first, followed by industry and concept matrix (as shown in Figure 3). This infers that the sparsity of relationship matrix is likely to associate with the prediction accuracy, which points out a research direction to enhance the model robustness.
Finally, compared with its three simplified versions, our framework achieves the better accuracy 54.41% on CSI300 and 54.05% on CSI500. It justifies that the three relationships have been integrated effectively in our framework. Note that our framework can be easily extended to incorporate more beneficial relationships.
5.3.3 The Length of Historical Information
The previous experiments have proved that the target stock price movement is interacted with the historical information of all involved stock and the window size is set at three days. This experiment aims to study the impact of the length of historical information on the model performance.
We conduct experiments on different length of days, specifically the past [3, 5, 7, 9, 11] days. As shown in Table LABEL:tab:Length and Figure 4, our model achieves the best performance at 7 days , with the accuracy 55.10% on CSI300 dataset and 54.25% on CSI500. The worst performance occurs at 9 days with accuracy 51.64% on CSI300 and 53.49% on CSI500. Therefore, the length of historical information has an impact on prediction performance. And according to the experiments, the best window size is 7 days.
The price movement of an individual stock is inevitably influenced by other related stocks. This paper justifies that taking historical information of all involved stocks and their multiple relationships into consideration can effectively improve the prediction accuracy. Our contribution is that we propose a RNN-GCN combined framework which is enable to integrate historical prices and various relationships of all involved stocks for prediction. Specifically, we first employ RNN to aggregate the historical information for each related stock. We creatively define industry, concept relationships among stocks and employ graph convolutional networks to extract the relationship influence. Our model can be easily extended to incorporate more effective relationships among stocks. In the future, we would like to explore more valid relationships to improve the prediction accuracy.
- [Bordino et al.2014] Ilaria Bordino, Nicolas Kourtellis, Nikolay Laptev, and Youssef Billawala. Stock trade volume prediction with yahoo finance user browsing behavior. In 2014 IEEE 30th International Conference on Data Engineering, pages 1168–1173. IEEE, 2014.
- [Brennan et al.2015] Michael J. Brennan, Jegadeesh Narasimhan, and Swaminathan Bhaskaran. Investment analysis and the adjustment of stock prices to common information. Review of Financial Studies, (4):4, 2015.
[Chen et al.2018]
Yingmei Chen, Zhongyu Wei, and Xuanjing Huang.
Incorporating corporation relationship via graph convolutional neural networks for stock price prediction.In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pages 1655–1658. ACM, 2018.
- [Chen et al.2019] Chi Chen, Li Zhao, Jiang Bian, Chunxiao Xing, and Tie-Yan Liu. Investment behaviors can tell what inside: Exploring stock intrinsic properties for stock trend prediction. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2376–2384, 2019.
[Ding et al.2015]
Xiao Ding, Yue Zhang, Ting Liu, and Junwen Duan.
Deep learning for event-driven stock prediction.
Twenty-fourth international joint conference on artificial intelligence, 2015.
- [Feng et al.2019] Fuli Feng, Huimin Chen, Xiangnan He, Ji Ding, Maosong Sun, and Tat-Seng Chua. Enhancing stock movement prediction with adversarial training. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, pages 5843–5849. AAAI Press, 2019.
- [Guo et al.2019] Shengnan Guo, Youfang Lin, Ning Feng, Chao Song, and Huaiyu Wan. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 922–929, 2019.
- [Hou and Kewei2007] Hou and Kewei. Industry information diffusion and the lead-lag effect in stock returns. Review of Financial Studies, 20(4):1113–1138, 2007.
- [Jin et al.2017] Fang Jin, Wei Wang, Prithwish Chakraborty, Nathan Self, Feng Chen, and Naren Ramakrishnan. Tracking multiple social media for stock market event prediction. In Industrial Conference on Data Mining, pages 16–30. Springer, 2017.
- [Kipf and Welling2017] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR2017), April 2017, pages 1–12, 2017.
- [Lee et al.2018] Ming-Hsuan Lee, Tou-Chin Tsai, Jau-er Chen, and Mon-Chi Lio. Can information and communication technology improve stock market efficiency? a cross-country study. Bulletin of Economic Research, 2018.
- [Li et al.2015] Qing Li, LiLing Jiang, Ping Li, and Hsinchun Chen. Tensor-based learning for predicting stock movements. In Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
[Li et al.2018]
Qimai Li, Zhichao Han, and Xiao-Ming Wu.
Deeper insights into graph convolutional networks for semi-supervised learning.In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
- [Li et al.2019] Chang Li, Dongjin Song, and Dacheng Tao. Multi-task recurrent neural networks and higher-order markov random fields for stock price movement prediction: Multi-task rnn and higer-order mrfs for stock price classification. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1141–1151, 2019.
- [Liu et al.2018] Qikai Liu, Xiang Cheng, Sen Su, and Shuguang Zhu. Hierarchical complementary attention network for predicting stock price movements with news. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pages 1603–1606. ACM, 2018.
[Nguyen and Shirai2015]
Thien Hai Nguyen and Kiyoaki Shirai.
Topic modeling based sentiment analysis on social media for stock market prediction.In
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1354–1364, 2015.
- [Qin and Yang2019] Yu Qin and Yi Yang. What you say and how you say it matters: Predicting financial risk using verbal and vocal cues. In 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), page 390, 2019.
- [Rekabsaz et al.2017] Navid Rekabsaz, Mihai Lupu, Artem Baklanov, Allan Hanbury, Alexander Dür, and Linda Anderson. Volatility prediction using financial disclosures sentiments with word embedding-based ir models. arXiv preprint arXiv:1702.01978, 2017.
- [Wu et al.2018] Huizhe Wu, Wei Zhang, Weiwei Shen, and Jun Wang. Hybrid deep sequential modeling for social text-driven stock prediction. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pages 1627–1630. ACM, 2018.
- [Xu and Cohen2018] Yumo Xu and Shay B Cohen. Stock movement prediction from tweets and historical prices. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1970–1979, 2018.
- [Zhang et al.2017] Liheng Zhang, Charu Aggarwal, and Guo-Jun Qi. Stock price prediction via discovering multi-frequency trading patterns. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pages 2141–2149. ACM, 2017.