HATS: A Hierarchical Graph Attention Network for Stock Movement Prediction

08/07/2019
by   Raehyun Kim, et al.
Korea University
0

Many researchers both in academia and industry have long been interested in the stock market. Numerous approaches were developed to accurately predict future trends in stock prices. Recently, there has been a growing interest in utilizing graph-structured data in computer science research communities. Methods that use relational data for stock market prediction have been recently proposed, but they are still in their infancy. First, the quality of collected information from different types of relations can vary considerably. No existing work has focused on the effect of using different types of relations on stock market prediction or finding an effective way to selectively aggregate information on different relation types. Furthermore, existing works have focused on only individual stock prediction which is similar to the node classification task. To address this, we propose a hierarchical attention network for stock prediction (HATS) which uses relational data for stock market prediction. Our HATS method selectively aggregates information on different relation types and adds the information to the representations of each company. Specifically, node representations are initialized with features extracted from a feature extraction module. HATS is used as a relational modeling module with initialized node representations. Then, node representations with the added information are fed into a task-specific layer. Our method is used for predicting not only individual stock prices but also market index movements, which is similar to the graph classification task. The experimental results show that performance can change depending on the relational data used. HATS which can automatically select information outperformed all the existing methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

07/22/2021

Graph-Based Learning for Stock Movement Prediction with Textual and Relational Data

Predicting stock prices from textual information is a challenging task d...
01/11/2022

Stock Movement Prediction Based on Bi-typed and Hybrid-relational Market Knowledge Graph via Dual Attention Networks

Stock Movement Prediction (SMP) aims at predicting listed companies' sto...
02/15/2021

REST: Relational Event-driven Stock Trend Forecasting

Stock trend forecasting, aiming at predicting the stock future trends, i...
09/25/2018

Temporal Relational Ranking for Stock Prediction

Stock prediction aims to predict the future trends of a stock in order t...
06/18/2021

FinGAT: Financial Graph Attention Networks for Recommending Top-K Profitable Stocks

Financial technology (FinTech) has drawn much attention among investors ...
09/29/2020

Stock2Vec: A Hybrid Deep Learning Framework for Stock Market Prediction with Representation Learning and Temporal Convolutional Network

We have proposed to develop a global hybrid deep learning framework to p...
09/29/2021

Stock Index Prediction using Cointegration test and Quantile Loss

Recent researches on stock prediction using deep learning methods has be...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Stock markets are a symbol of market capitalism and billions of shares of stock are traded every day. In 2018, stocks worth more than 65 trillion U.S. dollars were traded worldwide and market capitalization of domestic companies listed in the U.S. exceeds the country’s GDP 111https://data.worldbank.org/. Although stock movement prediction is a difficult problem, its solutions can be applied to industry. Many researchers in both industry and academia have long shown interest in predicting future trends in the stock market. Researchers focused on finding profitable patterns in historical data are known as quants in the financial industry and referred to as data scientists in general. Regardless of which term is used, such researchers are increasingly using more systematic trading algorithms to automatically make trading decisions.

Even though there is still room for debate [21], numerous studies have showed that the stock market is predictable to some extent [5], [23]. Existing methods are based on the ideas of fundamentalists or technicians, both of whom have different perspectives on the market.

Fundamentalists believe that the price of securities of a company corresponds to the intrinsic value of the company or entity [8]. If the current price of a company’s stock is lower than its intrinsic value, investors should buy the stock as its price will go up and eventually be the same as its fundamental value. The fundamental analysis of a company involves an in-depth analysis of its performance and profitability. The intrinsic value of the company is based on its product sales, employees, infrastructure, and profitability of its investments [2].

Technicians, on the other hand, do not consider real world events when predicting future trends in the stock market. For technicians, stock prices are considered as only typical time series data with complex patterns. With appropriate preprocessing and modeling, patterns can be analyzed, from which profitable patterns may be extracted. The information used for technical analysis consists of mainly closing prices, returns, and volumes. The movement of stock prices is known to be stochastic and non-linear. Technical analysis studies focus on reducing stochasticity and capturing consistent patterns.

Some technical analysis works have focused on how to extract meaningful features from raw price data. In the finance industry, features extracted from such data are called technical indicators and include adaptive moving average, relative strength index, stochastics, momentum oscillator, and commodity channel index. Creating meaningful technical indicators is similar to manual feature engineering in general machine learning tasks. Like in any machine learning task, extracted features contain important information for models. Hence, some works have utilized indicators to more accurately predict the movement of stock prices

[9], [22].

Technicians are also interested in finding meaningful patterns in raw price data. Numerous works have analyzed the effectiveness of different models. Although the majority of researchers agree that the stock market moves in a non-linear way, many empirical studies show that non-linear models do not outperform linear models [1], [2]. The results of these studies show that even though deep neural network based models have been successfully applied to many challenging domains, careful consideration should be given when trying to design profitable models for stock market prediction. Lately, more studies are finding that non-linear models outperform linear models [24]

. Many studies have shown that recurrent neural network based models are effective in stock movement prediction

[3].

As the amount of information on the web continues to rapidly increase, it becomes easier to obtain information on securities from different sources. Data scientists with a computer science background have begun to pay attention to unstructured data such as text from the news and Twitter [4], [10]. Models that use text data reflecting the real world events of companies can be categorized as fundamental analysis models. However, text-based stock prediction approaches try to capture investors’ opinions about an event. Based on the assumption that the price of a company’s stock can be based on the total aggregation of investors’ opinions about the company, some works focused on reading investor’s opinions about companies [20]. There also exist researches focusing on understanding the impact of events on stock price [11].

More recently, computer science research communities have been highly interested in utilizing graph-structured data [16], [28]. Stock market prediction methods using corporate relational data have also been proposed [7], [13]. Chen et al. created a network of companies based on financial investment information [7]. Using a constructed adjacency matrix, they trained a GCN model and compared its prediction performance with that of more conventional network embedding models’. Feng et al. developed a more general framework [13] that involves using many different types of relations in a publicly available knowledge database. They also proposed a GNN model that can capture temporal features of stocks. Although these models were the first to integrate relational data for stock market prediction, they can be still be improved. The quality of information varies considerably depending on the type of relation. However, no existing work has thoroughly investigated which types of relational data are more beneficial to stock movement prediction or focus on finding an effective way to selectively aggregate information on different relation types.

Furthermore, previous works have focused mainly on node classification. Node classification and graph classification are the two main tasks in graph-based learning. In a stock market network, individual nodes typically represent companies. Predicting future trends in individual stock prices is similar to the node classification task. We argue that previously proposed models can be used as a node representation updating function in the graph classification task which we propose in this work.

To address the limitations mentioned above, in this paper, we study how to effectively utilize graph-based learning methods and relational data in stock market prediction. We use different types of relations and investigated their effect on performance in stock price movement prediction of individual companies. In our experiments, we found that only relevant relations are useful for stock prediction. Information from some irrelevant relations even degraded prediction performance. We propose HATS which is a new hierarchical graph attention network method that uses relational data for stock market prediction. HATS selectively aggregates information from different relations and adds the information to the representations of companies. Specifically, node features are initialized with extracted features from the feature extraction module. HATS is used as relational modeling module to selectively gather information from neighboring nodes. The representations with the added information are then fed into a task-specific prediction layer.

We applied our method to the following two graph related tasks: predicting the movement of individual stock, which is the same node classification task performed in previous works, and predicting the movement of the market index, which is similar to the graph classification task. This is a new way of adapting graph-based learning in stock market prediction. Since market indices consist of individual stocks, we can predict the movement of a market index using a graph classification based approach. The experimental results on both tasks demonstrate the effectiveness of our proposed method.

The main contributions of this work can be summarized as follows.

  • We thoroughly investigate the effect of using relations in stock market prediction and find characteristics of more meaningful relation types.

  • We propose a new method HATS which can selectively aggregate information on different relation types and add the information to each representation.

  • We propose graph classification based stock prediction methods. Considering the market index as an entire graph and constituent companies as individual nodes, we predict the movement of a market index using a graph pooling method.

  • We perform extensive experiments on stocks listed in the S&P 500 Index. Our experimental results demonstrate that the performance of our method in terms of Sharpe ratio and F1 score was 19.8% and 3% higher than the existing baselines, respectively.

The remainder of this paper is organized as follows. In Section 2, we provide short preliminaries that can be helpful in understanding our work. Detailed descriptions of our proposed framework are provided in Section 3. In Section 4, we explain how we collected the data used in our experiments. We discuss our experimental results in Section Section 5 and we conclude our work in Section 6.

2 Preliminaries

Graph Theory

A graph is a powerful data structure which can be used to deal with relational data. Various methods learn meaningful node representations in graphs. In this section, we provide a brief preliminary about a graph based method. Graph consists of a set of vertices (nodes) and edges . If a node is denoted as and is an edge connecting node and , the Adjacency matrix is an matrix with . The degree of a node is the number of edges connected to the node, and is denoted as D where . Each node can have node features (attributes) X, where is a feature matrix.

The features of nodes change over time in a spatial-temporal graph which can be defined using a feature matrix where is the length of time steps.

Graph Neural Networks

With a growing interest in utilizing graph-structured data, a large amount of research has been conducted for learning meaningful representations in graphs. Most graph neural networks (GNNs) can be categorized as spectral or non-spectral.

Spectral graph theory based methods such as GCN [17]

utilize convolutional neural networks (CNN

[18]) to capture local patterns in graph-structured data. GCN applies a spectral convolution filter to extract information in the Fourier domain.

(2.1)

Equation (2.1) describes a spectral convolution filter used for graph data

(n companies) and a diagonal matrix M. U is the eigenvector matrix of a graph Laplacian matrix.

However, in large graph data, computing eigendecomposition of graph Laplacian is computationally too expensive. To address this problem, Kipf and Welling approximated spectral filters in terms of Chebyshev polynomials up to order based on Chebyshev coefficient , which can be defined as follows.

(2.2)

where with

denotes the largest eigenvalue of graph Laplacian

.

Additionally, Chebyshev coefficients could be represented as with and . In [17], GCN is proven to be effective with the parameter setting of K=1. Also, they simply transformed Equation (2.2) as a fully connected layer with a built-in convolution filter.

On the other hand, non-spectral approaches directly define convolution operations directly on the graph, utilizing spatially close neighbors. For example, Hamilton et al. proposed a general framework for sampling and aggregating features from the local neighborhood of a node to generate embeddings. Specifically, features of neighboring nodes are aggregated iteratively using a learnable aggregation function, which is described as follows.

(2.4)
(2.5)

where denotes the representation of node at -th iteration and is a learnable aggregation function. Many proposed methods can be considered as special types of aggregation functions. For example,Veličković et al. assigned different weights using attention mechanism to aggregate features of neighboring nodes [25].

Updated node representations can be used in both node classification and graph classification tasks. For a graph classification task, additional layers are needed to sum individual node representations and make graph representations. Graph pooling is a technique used in making graph representations. Numerous works which can effectively aggregate node features have been proposed [19], [28].

GNN methods have proven to achieve state-of-the-art performance in various tasks such as link prediction [29], social network community structure prediction [6], and recommendation [27].

3 Methodology

Figure 1: General framework of stock prediction using relational data.

In this section, we will first explain our entire framework. Our framework is based on many different stock market prediction methodologies that use corporate relational data. Knowing how the general framework functions can help in understanding the importance of using relational data in stock prediction. The overall framework is shown in Fig. 1. After providing a general description of the framework, we will elaborate on the structure of our method HATS is a new type of relational modeling module.

3.1 General framework

Feature Extraction Module

A stock market graph is a typical type of spatial-temporal graph. If we regard individual stocks (companies) as nodes, each node feature can represent the current state of each company with respect to price movement. Also, node features can evolve over time. As mentioned in Section 1, numerous types of data (e.g. historical price, text or fundamental analysis based sources) can be used as an indicator for the movement of a stock price. As data such as raw text or price data are not informative enough, we need a feature extraction module for obtaining meaningful representations of individual companies. In this study, we use only historical price data.

A feature extraction module is used to represent the current state of a company based on historical movement patterns. Numerous tools that predict future trends in the stock market using raw price data as their input can be used as a feature extraction module. In this study, we use LSTM and GRU as our feature extraction modules. LSTM is the most widely used framework in time series modeling and [7] and [13]

have also used LSTM as their feature extraction module. For a more detailed description of how node feature vectors are extracted from raw price data, we refer readers to

[14]. We also use GRU as a feature extraction module as it is known to be more efficient than LSTM in time series tasks, and obtains similar performance with appropriate tuning. From our experiments, we found that LSTM performs slightly better than GRU on average. However, it was more difficult to train LSTM especially when a model had more layers. For this reason, we use LSTM for the individual stock prediction task and GRU for the index movement prediction task where an additional graph pooling layer is needed.

Relational Modeling Module

A relational modeling module is a node updating function. Gilmer et al. considered graph-based learning as information exchange between related nodes [15]. The main function of graph neural networks is information exchange between neighboring nodes. Information from neighboring nodes is aggregated and then added to each node representation. Information collected from different nodes and relation types needs to be effectively combined. To this end, we propose a new GNN based Hierarchical graph Attention Network for Stock market prediction (HATS) method. Each layer is designed to capture the importance of neighboring nodes and relation types. A detailed description of our proposed method HATS is provided in the below section Subsection.

Task-Specific Module

After node representations are updated using relational modeling, the node representations are fed into the task-specific module. Since node representations can be used in various tasks with appropriate modeling, the layer is considered "task-specific." In this study, we performed experiments on the following two graph-based learning tasks: individual stock prediction and market index prediction. Individual stock prediction is similar to the node classification task which was performed in previous researches [7], [13]. As market indices consist of multiple related stocks, information on the current state of an individual company can be utilized to predict the movement of its index. As recently proposed graph pooling methods can be used to aggregate information of individual nodes to represent an entire graph, they can also be used for the index prediction task. The experimental results in Section 5 demonstrate that the graph pooling methods outperform all baseline methods.

In the next subsections, we describe HATS in more detail. We present our method HATS which aggregates information and adds it to node representations. Then, we explain how we use node representations with added information in two different tasks.

Figure 2: Hierarchical Attention Network for Stock Prediction

3.2 Hierarchical Attention Network

Let us denote a f-dimensional feature vector from a feature extraction module of a company i at time t as . In figure 2, we omit superscript t for simplicity, assuming that all the node representation vectors are calculated at time step t. We can define edges between different types of relations. For the graph neural network operation, we have to know the set of neighboring nodes for our target node i from each relation type. Let us denote the set of neighboring nodes of i for relation type m as and the embedding vector of relation type m as . Here, d is a dimension of a relation type embedding vector. Our goal is to selectively gather information on different relations from neighboring nodes. We want our models to filter information that is not useful for future trend prediction. This process is important because companies have many different types of relationships and some information is not related to movement prediction.

Attention mechanism is widely used to assign different weight values for information selection. With hierarchically designed attention mechanism, our Hierarchical Attention network for Stock prediction (HATS) selects only meaningful information at each level. Its hierarchical attention network is key in improving performance. The architecture of HATS is shown in Fig. 2.

At the first state attention layer, HATS selects important information on the same type of relation from a set of neighboring nodes. The attention mechanism is used to calculate different weights based on the current state (representation) of a neighborhood node. To calculate the state attention scores, we concatenate relation type embedding and the node representations of i and j into a vector where . If we denote the concatenated vector as , the state attention score is calculated as follows:

(3.1)
(3.2)

where and are learnable parameters used to calculate the state attention scores. With attention weight calculated using Eq. (3.2), we combine all weighted node representations to calculate a vector representation of relation for company as Eq. (3.3).

(3.3)

With above equation, all the representations of each type of relation are obtained. We selectively gathered information on specific relations from neighboring nodes. A representation can be considered as summarized information of a relation. Vector contains summarized information from relation . For example, the representation of the industry relation summarizes the general state of the industry of our target company. Like human investors, our model should prioritize trading decisions based on summarized information of each relation. The second layer of HATS is designed to continuously assign importance weights to information using another attention mechanism.

We concatenate the summarized relation information vector , representation of the current state of company , and the relation type embedding vector to use as input for the relation attention layer.

(3.4)
(3.5)

and are learnable parameters, and weighted vectors of each relation type are added to form an aggregated relation representation as stated in Eq. (3.6) which is similar to Eq. (3.3).

(3.6)

Finally, the representation of a node is added.

(3.7)

In the next two subsections, we describe how updated node representations can be used in different tasks.

3.3 Individual Stock Prediction Layer

Like previous works such as [7] and [13], our model can be applied to the individual stock prediction task. We performed classification on the following three types of labels: [up, neutral, down]. A detailed description of the task setting is provided in Section 5

. For the individual stock prediction task, we added only a simple linear transformation layer.

(3.8)

where , , and l is the number of movement classes. We trained models on all the corporate relational data using cross-entropy loss.

(3.9)

where is a ground truth movement class of company and denotes all the companies in our dataset.

3.4 Graph Pooling for Index Prediction

A market index consists of multiple stocks chosen based on specific criteria. Let us denote a graph of a specific market index with companies as , where a group of constituent companies of index is and its updated node representation is . To obtain the representation of the entire graph, the features of individual nodes need to be aggregated. Recently, numerous graph pooling methods for aggregation, such as [19] and [28], have been proposed. Stock market index data has its own historical price patterns which can be used as features. Therefore, we combine features obtained by graph pooling individual nodes and features directly extracted from historical price data.

We used mean pooling methods in our experiments to calculate graph representations as follows:

(3.9)

where is the updated representation of company . By denoting the target index’s own feature vector extracted using the feature extraction module as , the final representation of an entire graph can be obtained by combining the original representation of the graph and the representation obtained by graph pooling as follows.

(3.10)

We also concatenated the two representations; however, this did not have a significant impact on performance. As in the individual stock prediction task, we make predictions using simple linear transformation with and , and train models using cross-entropy loss as follows.

(3.11)
(3.12)

Note that we use the most basic pooling method as this is the first work to apply graph pooling to the stock prediction task. There exists much room for improvement, which we we leave for future work.

Figure 3: Dataset arrangement for the experiment. A 50-day period is used for evaluation right before the test period. A black line indicates an actual S&P 500 Index closing price.

4 Data

4.1 Price-related data

In this study, we focused on the U.S. stock market, one of the largest markets in the world based on market capitalization. We gathered corporate relational data from a public database which contains information on most of the S&P 500 companies. Among the S&P listed companies, there exist some companies without any type of relation with other companies in the database. After removing such companies, the remaining 431 companies were used as our target companies.

We sampled price data for our study from 2013/02/08 to 2017/10/05 (1174 trading days in total). Figure 3. shows the closing price of the S&P 500 index, which represents the overall market condition. As shown in Figure 3., although the index price has a tendency to go up, there are several crashes in our sample period. We used different experimental settings with varying degrees of volatility to evaluate performance. A more detailed description of the task settings is provided in Section 5.

As we described in Section 3, raw features of price-related data are fed to the feature extraction module. Many different types of raw features such as open price, close price, and volume can be used. In this study, following [14], we use historical price change rates as our input. Let and be the closing prices of a company i at time , respectively. The price change rate at time t is calculated as . As our model can predict the movement of a stock price, the price change rate can also be predicted. Therefore, our model can predict the price change rate of the next day , given the sequence of the historical price change rate of a company .

4.2 Corporate Relation data

The second type of data we used is corporate relational data. Following Feng et al., we collected corporate relational data from Wikidata [26]. Wikidata is a free collaborative knowledge base which contains relations between various types of entities (e.g. person, organization, country). Each entity has an index. If two entities have a relationship, it is considered as property. For example, the sentence "Steve jobs founded Apple" is expressed as a triplet [Apple, Founded by, Steve jobs]. In terms of graphs, each entity in Wikidata is a node and each property is an edge. Therefore, Wikidata can be understood as a heterogeneous graph with many different types of nodes and edges.

Here, companies are the only node type in which we are interested. However, there are a few types of edges between companies and their connections are very sparse. To address this problem, we utilize meta-path which is commonly used to deal with heterogeneous graphs [12]. If Steve Jobs was a board member of Google, we can make the triplet [Steve jobs, Board Member, Google]. Combined with the above mentioned relation [Apple, Founded, Steve jobs], the two companies Apple and Google are now connected by the meta-path [Founded by, Board member] and share the node Steve Jobs. In this way, we found that there exist 75 types of relations including direct relations between companies and meta-paths. The entire lists of individual relations and meta-paths used in this study are provided in the appendix.

One of our main goals is to study the effect of using corporate relational data on stock market prediction performance. There are many ways to define a set of neighboring nodes. We used a meta-path with only 2 hops at maximum to convert an originally heterogeneous graph into a homogeneous graph with only company nodes. Still, methods for building a corporate relational network from a large knowledge base can be much improved, which we leave for future work.

5 Experiments

5.1 Experiment design

General settings

- As we mentioned in Section 3.3, we divided the training data into the following three classes based on the price change ratio: [up, neutral, down]. Specifically, two threshold values were used to divide the training data into the three classes and to assign labels to evaluation and test data. This labeling strategy labels small movements as neutral and uses only the significant movements as directional labels.

As shown in figure 3, although the price tends to go up eventually, there exist frequent stock market crashes. To ensure a strategy is profitable, it is important to keep your drawdown at a minimum level. Therefore, we should determine whether models are effective even in a highly volatile period. For this purpose, we divided our entire dataset into 8 smaller datasets that went through different phases, following [3]. Each phase consists of 250 days of training, 50 days of evaluation, and 100 days of testing.

For all the models in our experiments, we used a 50-day lookback period. As we used only the price change ratio as our input feature, the length of the input vector is 50. We used LSTM as a feature extraction module for individual stock prediction, and GRU for the index movement prediction task. We optimized all the models using the Adam optimizer, and tuned the hyper-parameters for each model within a certain range. Specifically, we used a learning rate between 1e-3 and 1e-5, weight decay between 1e-4 and 1e-5, and dropout between 0.1 and 0.9. Relu was used as our activation function. We measured the performance of the models on the evaluation set for each period. We performed early stopping based on F1 score. As the results of the stock prediction task tend to vary widely, all the experiments in this work were repeated five times. The results were averaged to obtain those numbers in the table.

To measure the profitability of the models, we used a trading strategy based on movement prediction. Following Fischer and Krauss, we made a neutralized portfolio based on the prediction value obtained by models [14]

.Since there are three classes, the prediction vectors from all the models are three dimensional. Values of each dimension represent the predicted probability of each class. We selected 15 companies with the highest up class probability and the long position was taken. For the 15 companies with the highest down class probability, the short position was taken. This method is widely used when creating simple trading strategies for prediction models. We implemented our model in TensorFlow. Our source code and data are available at

https://github.com/to-be-done.

Measurement

We evaluated our models based on profitability and classification. In general, creating profitable trading strategies is the ultimate goal of stock movement prediction. Using the trading strategy mentioned above, we used two metrics to calculate profitability.

  • Return We calculated the return of our portfolio as follows.

    (5.1)

    where denotes a set of companies included in the portfolio at time , and denotes the price of stock at time . is a binary value between 0 and 1. is 0 if the long position is taken at time for stock ; otherwise, it is 1.

  • Sharpe Ratio Sharpe ratio is used to measure the performance of an investment compared to its risk. The ratio calculates the earned return in excess of the risk-free rate per unit of volatility (risk) as follows:

    (5.2)

where denotes an asset return and is the risk free rate. We used the 13-week Treasury bill to calculate the risk-free rates.

As price movement prediction is a special type of classification task, we used metrics widely used in classification tasks.

  • Accuracy and F1-Score These two metrics are the most widely used for measuring classification performance.

    Each prediction can be labeled as True Positive(TP), True Negative(TN), False Positive(FP), or False Negative (FN). Accuracy and F-Score are calculated as follows.

(5.3)
(5.4)
(5.5)

After calculating the F1-score of each class, we averaged all the scores to obtain the macro F1-score.

Methods

- We conducted experiments on the following baseline models. We describe the architecture of each model. We used different combinations of architectures and found that deeper structures generally suffer from overfitting.

Baselines without the relational modeling module.

  • MLP

    Basic Multi Layer Perceptron model. We used an MLP consisting of 2 hidden layers with 16 and 8 hidden dimensions, respectively, and 1 prediction layer.

  • CNN We used Convolutional Neural Network as it is known to be fast and as effective as RNN-based models in time series modeling. In our experiments, we used CNN with 4-layers and 2 convolutions and 2 pooling operations. The two convolutional layers with filter sizes of 32 and 8, respectively, and 5 kernels are used for each layer.

  • LSTMLong Short-Term Memory is one of the most powerful deep learning models for time series forecasting. Many previous works have proven the effectiveness of LSTM. We used a LSTM network with 2 layers and a hidden size of 128. To train LSTM, we used the RMSProp optimizer which is known to be suitable for RNN-based models.

Baselines with the relational modeling module.

  • GCN[7] Basic Graph Convolutional Neural network model. Following [7], we used a GCN model with two convolution layers and one prediction layer as stated in Eq. (5.6). All types of relations are used to create an adjacency matrix.

  • GCN-TOP20 We used the same GCN model but we used the edges from only the top 20 types of relations in the experiment, described in subsection 5.2, to create an adjacency matrix. Only relations that are manually selected for stock market prediction are included in the adjacency matrix. By comparing GCN-Top20 with vanilla GCN, we analyzed the effect of using relations on stock market prediction performance.

  • TGC[13] Temporal Graph Convolution module for Relational modeling. Feng et al. proposed a general module for stock prediction. This module assigns values to the neighboring nodes of the target company based on the current state of the company and the relations between the nodes and the company. TGC aggregates all the information of a target company from its neighboring nodes while our HATS model summarizes information on different relation types.

As mentioned in Section 3.1, for all models with a relational modeling module, LSTM is used as a feature extraction module in the individual stock prediction task. In the index movement prediction task, GRU is used as a feature extraction module. The simpler design of GRU makes it easier to train and helps obtain consistent results with deeper model architecture. Therefore, we used GRU as a feature extraction module for all models with the relational modeling module in the index movement prediction task.

Best 10
Relation Type F1
Industry-Legal form 0.3276
Industry-Product or material produced 0.3251
Parent organization-Owner of 0.325
Owned by-Subsidiary 0.3247
Parent organization 0.3247
Founded by-Founded by 0.3245
Follows 0.3244
Complies with-Complies with 0.3242
Owner of-Parent organization 0.3241
Subsidiary-Owner of 0.3241
Worst 10
Legal form-Instance of 0.311
Instance of-Legal form 0.3082
Location of formation-Country 0.307
Country-Location of formation 0.3053
Stock Exchange 0.2952
Country of origin-Country 0.2948
Country-Country of origin 0.2886
Country-Country of origin 0.2851
Instance of-Instance of 0.2748
Stock Exchange-Stock Exchange 0.2665
Table 1: Results of using different relations from Phase 4.
Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 Phase 6 Phase 7 Phase 8 Average
F1
MLP 0.2876 0.2862 0.2763 0.2810 0.2873 0.2855 0.3173 0.3231 0.2930
CNN 0.3111 0.3208 0.2938 0.3176 0.3354 0.3265 0.3417 0.3374 0.3230
LSTM 0.3173 0.3228 0.3064 0.3030 0.3333 0.3229 0.3533 0.3391 0.3248
GCN 0.2874 0.3068 0.2692 0.2940 0.3116 0.2914 0.3015 0.3241 0.2983
GCN-TOP20 0.3161 0.3339 0.3113 0.3240 0.3450 0.3140 0.3187 0.3429 0.3257
TGC 0.3110 0.3088 0.2237 0.2970 0.3329 0.2798 0.3168 0.3378 0.3010
HATS 0.3314 0.3347 0.3085 0.3243 0.3407 0.3394 0.3436 0.3556 0.3349
Accuracy
MLP 0.3455 0.3342 0.3547 0.3647 0.3208 0.3300 0.3553 0.3537 0.3449
CNN 0.3540 0.3626 0.3571 0.3855 0.3834 0.3803 0.4309 0.3901 0.3805
LSTM 0.3597 0.3604 0.3771 0.3816 0.3684 0.3841 0.4252 0.3891 0.3807
GCN 0.3752 0.3735 0.3860 0.3992 0.4191 0.3627 0.4510 0.3993 0.3958
GCN-TOP20 0.3700 0.3726 0.3834 0.3897 0.4164 0.3699 0.4488 0.4040 0.3944
TGC 0.3811 0.3701 0.3831 0.4059 0.4239 0.3716 0.4477 0.4022 0.3982
HATS 0.3735 0.3693 0.3774 0.3891 0.3880 0.3869 0.4418 0.3921 0.3898
Table 2: Classification accuracy scores on the individual stock prediction task

5.2 Analysis of the effect of using relation data

We first conducted experiments to investigate the impact of using different types of relations for stock market prediction. The experiments were performed on the individual stock prediction task. To measure the effect of different relations, we used a basic GCN model that cannot distinguish the types of relations. Following [7], we used a GCN with two convolution layers and one prediction layer, which is defined as follows:

(5.6)

where . Here, is an adjacency matrix with added self-connections, and is a degree matrix of . Therefore, changing the relation type changes the adjacency matrix that is fed into the GCN. We list the 10 best and 10 worst relations and their F1 scores on the test set of Phase 4 in Table 1 Table 1.

Our key findings are as follows.

  • Using relation data does not always yield good results in stock market prediction. In our worst cases, using relation data significantly decreased performance. On the other hand, some relation information proved to be helpful in prediction. The best performance is 6% higher than the worst performance.

  • Densely connected networks usually have noise. We confirmed this while analyzing the characteristics of the best and worst relations. Although the number of relations does not affect performance the most, less semantically meaningful relations such as country and stock exchange have very dense networks. Intuitively, densely connected networks carry a considerable amount of noise, which adds irrelevant information to the representations of target nodes.

  • Manually finding optimal relations is laborious. Although semantically meaningful relations generally help improve performance, selecting such relations requires much work and expertise.

Based on the above findings, we can conclude that relational information should be selectively chosen when using it for stock market prediction. Furthermore, the framework should be designed to automatically select useful information to minimize the need for manual feature engineering. We conducted experiments on two different tasks to verify the effectiveness of different relational modeling approaches.

5.3 Individual Stock Prediction

Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 Phase 6 Phase 7 Phase 8 Average
Average Daily Return (%)
MLP 0.0672 -0.0195 0.0029 0.0945 -0.0623 -0.0002 -0.0081 0.0319 0.0133
CNN -0.0506 0.0929 -0.0623 -0.0578 -0.0673 0.0140 0.0272 -0.0122 -0.0145
LSTM 0.0904 0.1005 0.0454 0.1429 -0.0159 0.0400 0.0246 0.0742 0.0627
GCN -0.0264 0.1057 -0.0189 0.0028 -0.0427 0.0748 0.0201 0.0837 0.0249
GCN-TOP20 -0.0103 0.2435 0.0246 0.0385 0.0415 0.0828 -0.0389 0.2356 0.0772
TGC -0.0517 0.1247 -0.0100 0.0113 -0.0202 0.0581 -0.0143 0.2175 0.0394
HATS 0.1231 0.1759 0.0703 0.1819 0.0183 0.0726 0.0860 0.0662 0.0993
Sharpe Ratio (Annualized)
MLP 2.4410 -1.0063 0.1070 2.1602 -1.6039 -0.0095 -0.4010 1.0398 0.3409
CNN -1.4459 3.2835 -1.7872 -0.6064 -1.7851 0.3565 0.9306 -0.3917 -0.1807
LSTM 2.3553 4.0651 1.0642 2.2014 -0.4455 1.1960 0.8354 2.1975 1.6837
GCN -0.2802 2.4700 -0.2477 0.0289 -0.7090 1.8324 0.4078 1.3746 0.6096
GCN-TOP20 -0.1013 4.9007 0.2994 0.3173 0.6222 2.0390 -0.9107 3.1870 1.2942
TGC -0.5029 3.0525 -0.1796 0.1085 -0.4131 2.9435 -0.6618 4.1305 1.0597
HATS 2.4796 4.3903 1.2503 2.3961 0.4087 1.6945 2.0334 1.4830 2.0170
Table 3: Profitability results on the individual stock prediction task
Figure 4: Comparison of different prediction models and their changes in asset value. The asset value is assumed to start at 100.

Classification Accuracy

The classification accuracy results of the experiments on individual stock market prediction are summarized in Table 2. Among the baselines without a relational modeling module, LSTM generally performs the best. Therefore, we compare the results of the models with a relational modeling module and the result of LSTM. In terms of accuracy, all models with a relational modeling module performed better than LSTM. However, not all relational models outperformed LSTM in terms of F1 score. As shown in Table 2, only GCN-Top20 and HATS achieved higher F1 scores. It is interesting that the GCN and TGC both of which obtained lower F1 scores than LSTM achieved the best accuracy. GCN and TGC tend to make biased predictions on a specific class. By making biased predictions, the GCN and TGC models obtained higher accuracy but lower F1 scores. On the other hand, GCN-Top20 and HATS obtained slightly lower accuracy than the two other relational module baselines but higher F1 scores.

Selectively aggregating information from different relations can help improve F1 scores. Although TGC performed better than vanilla GCN, TGC was outperformed by GCN-Top20 which was trained on manually selected relational data. In contrast, our proposed model HATS generally outperformed all the baselines in terms of F1 score. These results are consistent with the profitability test results which are provided in the following subsection.

Profitability test

The individual stock prediction results on the profitability test are summarized in Table 3. We calculated the daily returns of the neutralized portfolio made using the strategy discussed in Section 5.1, and averaged them for each period. On average, GCN-Top20 and HATS obtained the highest average daily return. As mentioned above, GCN-Top20 and HATS outperformed GCN and TGC in terms of F1 score. TGC performed better than vanilla GCN but worse than LSTM. Surprisingly, the Sharpe ratio of GCN-Top20 was lower than that of LSTM. Without even calculating the Sharpe ratio, we can see in Table 3

that the expected return results of GCN-Top20 have large variance, which may be attributed to GCN-Top20 using relational data statically. Although relations used for GCN-Top20 are manually selected and expected to improve stock prediction, fixed relations may be useful only in a specific market condition. As GCN cannot assign importance to neighboring nodes based on the market condition and current state of a given node, its results vary widely. By selecting useful information based on the market situation, our HATS model obtains good performance in terms of expected return and Sharpe ratio.

S5CONS S5FINL S5INFT S5ENRS S5UTIL Average
F1
MLP 0.2986 0.3002 0.2867 0.2785 0.2928 0.2913
CNN 0.3013 0.3157 0.3036 0.3011 0.3025 0.3049
LSTM 0.3405 0.2859 0.3454 0.3109 0.2942 0.3154
GCN 0.3410 0.3040 0.3423 0.2848 0.3111 0.3166
TGC 0.3322 0.3051 0.3391 0.2736 0.2911 0.3082
HATS 0.3758 0.3148 0.3518 0.2967 0.3256 0.3329
Accuracy
MLP 0.3290 0.3463 0.3318 0.3210 0.3282 0.3313
CNN 0.3429 0.3392 0.3434 0.4126 0.3235 0.3537
LSTM 0.3625 0.3506 0.3808 0.4550 0.3100 0.3718
GCN 0.3722 0.3637 0.3591 0.4531 0.3373 0.3771
TGC 0.4021 0.3699 0.3754 0.4468 0.3250 0.3819
HATS 0.4095 0.3662 0.3834 0.4620 0.3531 0.3948
Table 4: Classification accuracy scores on the market index prediction task

5.4 Market Index Prediction

As mentioned in section 4, we gathered price and relational data for 431 companies listed in the S&P 500. There exist 9 different market indices each representing an industrial sector. We removed four indices with less than 20 constituent companies and have five remaining market indices. The five market indices are as follows: S5CONS (S&P 500 Consumer Staples Index), S5FINL (S&P 500 Financials Index), S5INFT (S&P 500 Information Technology Index), S5ENRS (S&P 500 Energy Index), S5UTIL (S&P 500 Utilities Index). As the graph of constituent companies is already sparse, we do not use GCN-Top20 as a baseline. The results are summarized in table 5.

Due to the space constraints, we provide only the averaged results for each index in table 5. The experimental results of each phase are provided in the appendix. Furthermore, we did not measure the profitability performance of a neutralized portfolio on the market index prediction task. It is not reasonable to make neutralized portfolio With only five assets as our portfolio selection universe.

On average, models with a relational modeling module outperformed LSTM on the market index prediction task. However, HATS is the only model that achieved significantly better performance than LSTM in terms of F1 score and accuracy. GCN performed slightly better than LSTM and TGC performed worse than LSTM in terms of F1 score. As we used the same pooling operation for all the models, the differences in performance can be mainly attributed to their relational modeling module. This again proves that HATS is effective in learning node representations for a given task. On the market index prediction task, HATS outperforms all the baselines in terms of F1 score and accuracy on average.

Unexpectedly, the other baselines with the relational modeling module did not perform significantly better than LSTM. The baselines cannot easily select information from different relation types and they use a naive structure to obtain graph representations. Many graph pooling methods such as [19] and [28] have already been proposed for learning graph representations, and proven to be more effective in many different tasks. We expect that more advanced pooling methods will further improve performance on the market index prediction task.

5.5 Case Study

Relation attention scores

In this section, we conduct two case studies to further analyze the decision-making mechanism of HATS. As previously mentioned, HATS is designed to gather information from only useful relations. For our first case study, we calculated the attention score of each relation. By analyzing the relation types with the highest and lowest attention scores, we can understand what types of relations are considered to be important. Fig. 5 shows a visualization of the attention scores of all the relations. We calculated the average attention scores on the test sets from all the phases and selected 20 relations with the highest attention scores and 10 relations with the lowest attention scores. The visualization shown in Fig. 5 is based on the average scores calculated in each test phase. As shown in Fig. 5, the relations with the highest attention scores are mostly dominant-subordinate relationships such as parent organization-subsidiary relationships. Some relations with the highest scores represent industrial dependencies. On the other hand, most of the relations with the lowest attention scores are geographical features.

Figure 5: Visualization of attention scores of different relations. 20 relations with highest attention scores on average and 10 relations with the lowest scores on average.
Figure 6: Visualization of node representations using T-SNE.

Node representation

In studies on graph neural network methods, researchers are interested in representations obtained by GNN. We present the visualization node representation obtained by HATS in Fig. 6. We obtain the representations of all companies on a specific day and use the T-SNE algorithm to map each representation to a two-dimensional space. In Figure 6(a), the movement of a stock on a given day is denoted by any one of the three colors which represent the up/neutral/down labels we used in our experiment. In Figure 6(b), industries of companies are denoted by different colors. We can find a rough line that separates companies with up labels from companies with down labels in Figure 6(a). It is also interesting that representations of the neutral movement are widely spread. In Figure 6(b), there exists a group of clusters in the same industry. We can find these clusters in any time phase. Although the prices of two stocks in the same industry do not always move in the same direction, the clusters in Figure 6(b) show that HATS learned meaningful representations.

6 Conclusion

In this work, we proposed our model HATS which uses relational data in stock market prediction. HATS is designed to selectively aggregate information on different relation types to learn useful node representations. HATS performed the graph related tasks of predicting individual stock prices and predicting market index movement. The experimental results prove the importance of using proper relational data and show that prediction performance can change dramatically depending on the relation type. The results also show that HATS which automatically selects information to use outperformed all the existing models.

There exist many possibilities for future research. First, finding a more effective way to construct a corporate network is an important research objective that could be the focus of future studies. In this study, we define the neighborhood of a company as a cluster of companies connected by direct edges or meta-paths with at most 2 hops. However, the way in which we define it could be improved. Furthermore, we used a single database (WikiData) to create a company network. In future work, we could use another source of data and we could even create knowledge graphs from unstructured text of various sources. Applying more advanced pooling methods to obtain graph representations could improve the overall performance of GNN methods on the market index prediction task.

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF-2017R1A2A1A17069645, NRF-2017M3C4A7065887)

References

  • Adebiyi et al. [2014] Adebiyi, A.A., Adewumi, A.O., Ayo, C.K., 2014. Comparison of arima and artificial neural networks models for stock price prediction. Journal of Applied Mathematics 2014.
  • Agrawal et al. [2013] Agrawal, J., Chourasia, V., Mittra, A., 2013. State-of-the-art in stock prediction techniques. International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering 2, 1360–1366.
  • Bao et al. [2017] Bao, W., Yue, J., Rao, Y., 2017.

    A deep learning framework for financial time series using stacked autoencoders and long-short term memory.

    PloS one 12, e0180944.
  • Bollen et al. [2011] Bollen, J., Mao, H., Zeng, X., 2011. Twitter mood predicts the stock market. Journal of computational science 2, 1–8.
  • Bollerslev et al. [2014] Bollerslev, T., Marrone, J., Xu, L., Zhou, H., 2014. Stock return predictability and variance risk premia: statistical inference and international evidence. Journal of Financial and Quantitative Analysis 49, 633–661.
  • Chen et al. [2018a] Chen, J., Ma, T., Xiao, C., 2018a. FastGCN: Fast learning with graph convolutional networks via importance sampling, in: International Conference on Learning Representations.
  • Chen et al. [2018b] Chen, Y., Wei, Z., Huang, X., 2018b. Incorporating corporation relationship via graph convolutional neural networks for stock price prediction, in: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, ACM. pp. 1655–1658.
  • Dechow et al. [2001] Dechow, P.M., Hutton, A.P., Meulbroek, L., Sloan, R.G., 2001. Short-sellers, fundamental analysis, and stock returns. Journal of Financial Economics 61, 77–106.
  • Dempster et al. [2001] Dempster, M.A., Payne, T.W., Romahi, Y., Thompson, G.W., 2001. Computational learning techniques for intraday fx trading using popular technical indicators. IEEE Transactions on neural networks 12, 744–754.
  • Ding et al. [2015] Ding, X., Zhang, Y., Liu, T., Duan, J., 2015.

    Deep learning for event-driven stock prediction, in: Twenty-Fourth International Joint Conference on Artificial Intelligence.

  • Ding et al. [2016] Ding, X., Zhang, Y., Liu, T., Duan, J., 2016. Knowledge-driven event embedding for stock prediction, in: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2133–2142.
  • Dong et al. [2017] Dong, Y., Chawla, N.V., Swami, A., 2017. metapath2vec: Scalable representation learning for heterogeneous networks, in: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, ACM. pp. 135–144.
  • Feng et al. [2019] Feng, F., He, X., Wang, X., Luo, C., Liu, Y., Chua, T.S., 2019. Temporal relational ranking for stock prediction. ACM Transactions on Information Systems (TOIS) 37, 27.
  • Fischer and Krauss [2018] Fischer, T., Krauss, C., 2018. Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research 270, 654–669.
  • Gilmer et al. [2017] Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E., 2017. Neural message passing for quantum chemistry, in: Proceedings of the 34th International Conference on Machine Learning-Volume 70, JMLR. org. pp. 1263–1272.
  • Hamilton et al. [2017] Hamilton, W., Ying, Z., Leskovec, J., 2017. Inductive representation learning on large graphs, in: Advances in Neural Information Processing Systems, pp. 1024–1034.
  • Kipf and Welling [2016] Kipf, T.N., Welling, M., 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 .
  • Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification with deep convolutional neural networks, in: Advances in neural information processing systems, pp. 1097–1105.
  • Lee et al. [2019] Lee, J., Lee, I., Kang, J., 2019. Self-attention graph pooling. arXiv preprint arXiv:1904.08082 .
  • Li et al. [2014] Li, X., Xie, H., Chen, L., Wang, J., Deng, X., 2014.

    News impact on stock price return via sentiment analysis.

    Knowledge-Based Systems 69, 14–23.
  • Malkiel [2003] Malkiel, B.G., 2003. The efficient market hypothesis and its critics. Journal of economic perspectives 17, 59–82.
  • Patel et al. [2015] Patel, J., Shah, S., Thakkar, P., Kotecha, K., 2015. Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques. Expert Systems with Applications 42, 259–268.
  • Phan et al. [2015] Phan, D.H.B., Sharma, S.S., Narayan, P.K., 2015. Stock return forecasting: some new evidence. International Review of Financial Analysis 40, 38–51.
  • Rather et al. [2015] Rather, A.M., Agarwal, A., Sastry, V., 2015. Recurrent neural network and a hybrid model for prediction of stock returns. Expert Systems with Applications 42, 3234–3241.
  • Veličković et al. [2017] Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y., 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 .
  • Vrandečić and Krötzsch [2014] Vrandečić, D., Krötzsch, M., 2014. Wikidata: a free collaborative knowledge base .
  • Ying et al. [2018a] Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W.L., Leskovec, J., 2018a. Graph convolutional neural networks for web-scale recommender systems, in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 974–983.
  • Ying et al. [2018b] Ying, Z., You, J., Morris, C., Ren, X., Hamilton, W., Leskovec, J., 2018b. Hierarchical graph representation learning with differentiable pooling, in: Advances in Neural Information Processing Systems, pp. 4800–4810.
  • Zhang and Chen [2018] Zhang, M., Chen, Y., 2018. Link prediction based on graph neural networks, in: Advances in Neural Information Processing Systems, pp. 5165–5175.

Appendix A Appendix

Code Relation Name Description
P17 Country sovereign state of this item; don’t use on humans
P31 Instance of
that class of which this subject is a particular example and member
(subject typically an individual member with a proper name label)
P112 Founded by founder or co-founder of this organization, religion or place
P121 Item operated equipment, installation or service operated by the subject
P127 Owned by owner of the subject
P131
Located in the
administrative
territorial entity
the item is located on the territory of the following administrative entity.
P138 Named after
entity or event that inspired the subject’s name,
or namesake (in at least one language)
P155 Follows
immediately prior item in a series of which the subject is a part
[if the subject has replaced the preceding item, e.g. political offices, use "replaces"]
P156 followed by
the immediately following item in some series of which the subject is part.
Use P1366 if the item is replaced e.g. political offices, states
P159 Headquarters location specific location where an organization’s headquarters is or has been situated.
P166 Award received award or recognition received by a person, organisation or creative work
P169 Chief executive officer highest-ranking corporate officer appointed as the CEO within an organization
P176 Manufacturer manufacturer or producer of this product
P355 Subsidiary subsidiary of a company or organization, opposite of parent organization
P361 Part of object of which the subject is a part
P400 Platform
platform for which a work was developed or released,
or the specific platform version of a software product
P414 Stock Exchange exchange on which this company is traded
P452 Industry industry of company or organization
P463 Member of organization or club to which the subject belongs
P488 Chairperson presiding member of an organization, group or body
P495 Country of origin country of origin of this item (creative work, food, phrase, product, etc.)
P625 Coordinate location geocoordinates of the subject.
P740 Location of formation location where a group or organization was formed
P749 Parent organization parent organization of an organisation, opposite of subsidiaries (P355)
P793 significant event significant or notable events associated with the subject
P1056
Product or
material produced
material or product produced by a government agency,
business, industry, facility, or process
P1343 Described by source dictionary, encyclopaedia, etc. where this item is described
P1344 Participant of event a person or an organization was/is a participant in,
P1454 Legal form legal form of an organization
P1552 Has quality the entity has an inherent or distinguishing non-material characteristic
P1830 Owner of entities owned by the subject
P1889 Different from item that is different from another item, with which it is often confused
P3320 Board member member(s) of the board for the organization
P5009 Complies with the product or work complies with a certain norm or passes a test
P6379
Has works
in the collection
collection that have works of this artist
Table 5: List of relation types and their definitions used to make meta-paths. Combinations of relations below are used in our study.
Relation Index Relation Combination (Code) Relation Index Relation Combination (Code)
1 P1454-P1454 37 P452-P176
2 P159-P159 38 P6379-P6379
3 P1454-P31 39 P155-P155
4 P159-P740 40 P495-P17
5 P17-P495 41 P749-P127
6 P17-P740 42 P749-P749
7 P414-P361 43 P4950-P495
8 P414 44 P495-P740
9 P452-P452 45 P355
10 P361-P361 46 P355-P155
11 P127-P749 47 P1830-P1830
12 P1344-P1344 48 P112-P749
13 P127-P127 49 P1056-P452
14 P740-P159 50 P1454-P452
15 P740-P740 51 P355-P127
16 P112-P112 52 P176-P452
17 P155 53 P159-P131
18 P1056-P1056 54 P1830
19 P1056-P31 55 P1830-P127
20 P127-P355 56 P1830-P749
21 P127-P1830 57 P355-P355
22 P361-P414 58 P131-P159
23 P127 59 P5009-P5009
24 P156 60 P31-P1056
25 P31-P1454 61 P1830-P355
26 P452-P31 62 P740-P17
27 P452-P1056 63 P740-P495
28 P463-P463 64 P1889
29 P625-P625 65 P749-P112
30 P361 66 P361-P463
31 P159-P17 67 P155-P355
32 P17-P159 68 P463-P361
33 P793-P793 69 P355-P1830
34 P166-P166 70 P121-P121
35 P31-P452 71 P749-P1830
36 P452-P1454 72 P749
Table 6: List of the relations used in our study. Both direct edges and meta-paths are included in the list.
F1
S5CONS
Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 Phase 6 Phase 7 Average
MLP 0.3155 0.3006 0.2673 0.3239 0.2917 0.2769 0.3141 0.2986
CNN 0.2658 0.3048 0.3185 0.2828 0.2866 0.3328 0.3181 0.3013
GRU 0.3233 0.3134 0.3782 0.3344 0.3337 0.3403 0.3603 0.3405
GCN-GRU 0.3163 0.3532 0.3756 0.3537 0.3734 0.2594 0.3554 0.3410
TGC-GRU 0.3707 0.3392 0.3585 0.2917 0.2984 0.2699 0.3970 0.3322
HATS-GRU 0.3480 0.3320 0.3822 0.4248 0.3755 0.3626 0.4057 0.3758
S5FINL
MLP 0.2762 0.2561 0.3334 0.3062 0.3346 0.3106 0.2840 0.3002
CNN 0.3035 0.3499 0.3348 0.3524 0.3204 0.3022 0.2467 0.3157
GRU 0.2913 0.2999 0.2999 0.3234 0.2482 0.2603 0.2785 0.2859
GCN-GRU 0.3166 0.3070 0.3713 0.3204 0.2759 0.2624 0.2742 0.3040
TGC-GRU 0.3131 0.3075 0.3418 0.3239 0.3014 0.2797 0.2680 0.3051
HATS-GRU 0.3084 0.3111 0.3741 0.3461 0.3130 0.2852 0.2658 0.3148
Accuracy
S5CONS
MLP 0.3535 0.3293 0.2970 0.3434 0.3232 0.3091 0.3475 0.3290
CNN 0.3374 0.3253 0.3394 0.3091 0.3758 0.3596 0.3535 0.3429
GRU 0.3576 0.3273 0.3879 0.3455 0.3919 0.3455 0.3818 0.3625
GCN-GRU 0.3394 0.3818 0.3960 0.3737 0.4571 0.2889 0.3687 0.3722
TGC-GRU 0.3980 0.3879 0.3838 0.3973 0.5051 0.3258 0.4167 0.4021
HATS-GRU 0.3636 0.3717 0.4101 0.4444 0.4889 0.3697 0.4182 0.4095
S5FINL
MLP 0.3232 0.3111 0.3596 0.3253 0.3859 0.3657 0.3535 0.3463
CNN 0.3183 0.3664 0.3636 0.2906 0.3520 0.3289 0.3549 0.3392
GRU 0.3010 0.3111 0.3111 0.4222 0.4000 0.3717 0.3374 0.3506
GCN-GRU 0.3333 0.3293 0.3778 0.4424 0.3636 0.3359 0.3636 0.3637
TGC-GRU 0.3010 0.3253 0.3939 0.4141 0.4040 0.3359 0.3460 0.3600
HATS-GRU 0.3131 0.3374 0.3899 0.4444 0.4040 0.3212 0.3535 0.3662
Table 7: Experimental results of the market indices with 7 phases.
F1
S5INFT
Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 Phase 6 Phase 7 Phase 8 Phase 9 Average
MLP 0.2898 0.2844 0.3107 0.3184 0.2765 0.2940 0.2464 0.2906 0.2727 0.2867
CNN 0.3030 0.3065 0.2943 0.3359 0.3040 0.2884 0.2856 0.3241 0.2909 0.3036
GRU 0.3212 0.3343 0.3873 0.3752 0.3244 0.3975 0.3460 0.3306 0.2918 0.3454
GCN-GRU 0.3267 0.3578 0.3659 0.3817 0.3455 0.3582 0.3178 0.2977 0.3294 0.3423
TGC-GRU 0.3350 0.3032 0.3854 0.3877 0.2952 0.3684 0.3448 0.3022 0.3296 0.3391
HATS-GRU 0.3645 0.3755 0.3787 0.3588 0.3599 0.3709 0.3172 0.3249 0.3159 0.3518
S5ENRS
MLP 0.2629 0.3081 0.2656 0.3038 0.2310 0.2268 0.2902 0.3268 0.2755 0.2785
CNN 0.2899 0.2890 0.3077 0.3036 0.2766 0.3181 0.3245 0.3211 0.2790 0.3011
GRU 0.3536 0.3446 0.3066 0.3306 0.3036 0.2793 0.2614 0.2880 0.3300 0.3109
GCN-GRU 0.3479 0.3022 0.2854 0.3096 0.2563 0.2533 0.2613 0.2795 0.2673 0.2848
TGC-GRU 0.3360 0.3148 0.2652 0.2590 0.1970 0.2622 0.2510 0.2670 0.3102 0.2736
HATS-GRU 0.3373 0.3221 0.3048 0.3191 0.2820 0.2530 0.2652 0.2840 0.3025 0.2967
S5UTIL
MLP 0.2793 0.3370 0.2956 0.2358 0.2827 0.2766 0.2712 0.2953 0.3481 0.2928
CNN 0.3227 0.2948 0.3197 0.3281 0.2905 0.3268 0.2809 0.2927 0.2665 0.3025
GRU 0.3582 0.2921 0.3015 0.2233 0.3143 0.2737 0.2910 0.2969 0.2966 0.2942
GCN-GRU 0.3305 0.3123 0.3269 0.3152 0.3159 0.3088 0.3086 0.2975 0.2840 0.3111
TGC-GRU 0.3185 0.3025 0.2988 0.2784 0.2744 0.2760 0.2731 0.3451 0.2529 0.2911
HATS-GRU 0.3710 0.2914 0.3209 0.3067 0.3474 0.3000 0.3378 0.3548 0.3003 0.3256
Accuracy
S5INFT
MLP 0.3232 0.3333 0.3515 0.3636 0.3232 0.3253 0.2808 0.3737 0.3030 0.3318
CNN 0.3374 0.3434 0.3212 0.3616 0.3152 0.3030 0.3172 0.4141 0.3778 0.3434
GRU 0.3010 0.3980 0.3980 0.3495 0.4263 0.4040 0.4505 0.3515 0.3485 0.3808
GCN-GRU 0.3293 0.3778 0.3657 0.3939 0.3535 0.3864 0.3333 0.3182 0.3737 0.3591
TGC-GRU 0.3515 0.3313 0.3906 0.3990 0.3333 0.3872 0.3813 0.4283 0.3763 0.3754
HATS-GRU 0.3737 0.3919 0.3838 0.3737 0.3778 0.3960 0.3636 0.4222 0.3677 0.3834
S5ENRS
MLP 0.3293 0.3313 0.3030 0.3293 0.2606 0.2747 0.3354 0.3879 0.3374 0.3210
CNN 0.3051 0.2990 0.3455 0.4364 0.2869 0.5636 0.4323 0.5677 0.4768 0.4126
GRU 0.3636 0.3758 0.3838 0.4101 0.3818 0.5616 0.4909 0.5879 0.5394 0.4550
GCN-GRU 0.3697 0.3374 0.3569 0.4606 0.3872 0.5732 0.5118 0.5859 0.4949 0.4531
TGC-GRU 0.3556 0.3636 0.3266 0.4646 0.3939 0.5682 0.4899 0.5717 0.4874 0.4468
HATS-GRU 0.3515 0.3879 0.3778 0.4768 0.3758 0.5616 0.5051 0.5859 0.5354 0.4620
S5UTIL
MLP 0.2949 0.3636 0.3495 0.2667 0.2949 0.3596 0.3111 0.3414 0.3717 0.3282
CNN 0.3475 0.3253 0.3434 0.3394 0.3313 0.3697 0.2990 0.2828 0.2727 0.3235
GRU 0.3657 0.2990 0.3212 0.2465 0.3354 0.2970 0.3030 0.3192 0.3030 0.3100
GCN-GRU 0.3354 0.3192 0.3460 0.3872 0.3359 0.3662 0.3165 0.3367 0.2929 0.3373
TGC-GRU 0.3313 0.3131 0.3131 0.3586 0.2980 0.3030 0.3283 0.3889 0.2904 0.3250
HATS-GRU 0.3818 0.2970 0.3535 0.3737 0.3758 0.3172 0.3510 0.4175 0.3106 0.3531
Table 8: Experimental results of the market indices with 9 phases.