Log In Sign Up

CNNPred: CNN-based stock market prediction using several data sources

Feature extraction from financial data is one of the most important problems in market prediction domain for which many approaches have been suggested. Among other modern tools, convolutional neural networks (CNN) have recently been applied for automatic feature selection and market prediction. However, in experiments reported so far, less attention has been paid to the correlation among different markets as a possible source of information for extracting features. In this paper, we suggest a CNN-based framework with specially designed CNNs, that can be applied on a collection of data from a variety of sources, including different markets, in order to extract features for predicting the future of those markets. The suggested framework has been applied for predicting the next day's direction of movement for the indices of S&P 500, NASDAQ, DJI, NYSE, and RUSSELL markets based on various sets of initial features. The evaluations show a significant improvement in prediction's performance compared to the state of the art baseline algorithms.


CNNpred: CNN-based stock market prediction using a diverse set of variables

Feature extraction from financial data is one of the most important prob...

U-CNNpred: A Universal CNN-based Predictor for Stock Markets

The performance of financial market prediction systems depends heavily o...

Stock Movement Prediction Based on Bi-typed and Hybrid-relational Market Knowledge Graph via Dual Attention Networks

Stock Movement Prediction (SMP) aims at predicting listed companies' sto...

Art Metrics

Success in art markets is difficult to quantify objectively, as it also ...

Cross-Market Product Recommendation

We study the problem of recommending relevant products to users in relat...

Predicting S P500 Index direction with Transfer Learning and a Causal Graph as main Input

We propose a unified multi-tasking framework to represent the complex an...

1 Introduction

Financial markets are considered as the heart of the world’s economy in which billions of dollars are traded every day. Clearly, a good prediction of future behavior of markets would be extremely valuable for the traders. However, due to the dynamic and noisy behavior of those markets, making such a prediction is also a very challenging task that has been the subject of research for many years. In addition to the stock market index prediction, forecasting the exchange rate of currencies, price of commodities and cryptocurrencies like bitcoin are examples of prediction problems in this domain (Shah & Zhang, 2014; Zhao et al., 2017; Nassirtoussi et al., 2015; Lee et al., 2017).

Existing approaches for financial market analysis fall into two main groups of fundamental analysis and technical analysis. In technical analysis, historical data of the target market and some other technical indicators are regarded as important factors for prediction. According to the efficient market hypothesis, the price of stocks reflects all the information about them (Fama, 1970) while technical analysts believe that prediction of future behavior of the prices in a market is possible by analyzing the previous price data. On the other hand, fundamental analysts examine securities intrinsic value for investment. They look at balance sheets, income statements, cash flow statements and so on to gain insight into future of a company.

In addition to financial market experts, machine learning techniques have proved to be useful for making such predictions. Artificial neural networks and support vector machine are the most common algorithms that have been utilized for this purpose

(Guresen et al., 2011; Kara et al., 2011; Wang & Wang, 2015)

. Statistical methods, random forests

(Khaidem et al., 2016)

, linear discriminant analysis, quadratic discriminant analysis, logistic regression and evolutionary computing algorithms, especially genetic algorithm,

(Hu et al., 2015b; Brown et al., 2013; Hu et al., 2015a; Atsalakis & Valavanis, 2009) are among other tools and techniques that have been applied for feature extraction from raw financial data and/or making predictions based on a set of features (Ou & Wang, 2009; Ballings et al., 2015).

Deep learning (DL) is a class of modern tools that is suitable for automatic features extraction and prediction (LeCun et al., 2015)

. In many domains, such as machine vision and natural language processing, DL methods have been shown to be able to gradually construct useful complex features from raw data or simpler features

(He et al., 2016; LeCun et al., 2015)

. Since the behavior of stock markets is complex, nonlinear and noisy, it seems that extracting features that are informative enough for making predictions is a core challenge, and DL seems to be a promising approach to that. Algorithms like Deep Multilayer Perceptron (MLP)

(Yong et al., 2017)

, Restricted Boltzmann Machine (RBM)

(Cai et al., 2012; Zhu et al., 2014)

, Long Short-Term Memory (LSTM)

(Chen et al., 2015; Fischer & Krauss, 2018), Auto-Encoder (AE) (Bao et al., 2017) and Convolutional Neural Network (CNN) (Gunduz et al., 2017; Di Persio & Honchar, 2016) are famous deep learning algorithms utilized to predict stock markets.

It is important to pay attention to the diversity of the features that can be used for making predictions. The raw price data, technical indicators which come out of historical data, other markets with connection to the target market, exchange rates of currencies, oil price and many other information sources can be useful for a market prediction task. Unfortunately, it is usually not a straightforward task to aggregate such a diverse set of information in a way that an automatic market prediction algorithm can use them. So, most of the existing works in this field have limited themselves to a set of technical indicators representing a single market’s recent history (Kim, 2003; Zhang & Wu, 2009).

Another important subject in the field is automatic feature extraction. Since the initial features are defined to be used by human experts, they are simple and even if they were chosen by a finance expert who has enough knowledge and experience in this domain, they may not be the best possible choices for making predictions by machines. In other words, an automatic approach to stock market prediction ideally is one that can extract useful features from different sources of information that seem beneficial for market prediction, train a prediction model based on those extracted features and finally make predictions using the resulted model. The focus of this paper is on the first phase of this process, that is to design a model for extracting features from several data sources that contain information from historical records of relevant markets. This data includes initial basic features such as raw historical prices, technical indicators or fluctuation of those features in the past days. Regarding the diversity of the input space and possible complexity of the feature space that maybe required for a good prediction, a deep learning algorithm like CNN seems to be a promising approach for such a feature extraction problem.

To the best of our knowledge, convolutional neural networks, CNN, has been applied in a few studies for stock market prediction (Gunduz et al., 2017; Di Persio & Honchar, 2016). Periso & Honchar (Di Persio & Honchar, 2016) used a CNN which took a one-dimensional input for making prediction only based on the history of closing prices while ignoring other possible sources of information like technical indicators. Gunduz et al. (Gunduz et al., 2017)

took advantage of a CNN which was capable of using technical indicators as well for each sample. However, it was unable to consider the correlation which could exist between stock markets as another possible source of information. In addition, structure of used CNN was inspired by previous works in Computer Vision, while there is fundamental difference between Computer Vision and Stock market prediction. Since in stock market prediction features interaction are radically different from pixel’s interaction with each other, using

or filters in convolutional layer may not be the best option. It seems cleverer to design filters of CNN based on financial facts instead of papers in Computer Vision.

We develop our framework based on CNN due to its proven capabilities in other domains as well as mentioned successful past experiments reported in market prediction domain. As a test case, we will show how CNN can be applied in our suggested framework, that we call CNNpred, to capture the possible correlations among different sources of information for extracting combined features from a diverse set of input data from five major U.S. stock market indices: S&P 500, NASDAQ, Dow Jones Industrial Average, NYSE and RUSSELL, as well as other sources of information including economic data, exchange rate of currencies, future contracts, price of commodities, important indices of markets around the world and price of major companies in U.S. market. Furthermore, the filters are designed in a way that is compatible with financial characteristic of features.

The main contributions of this work can be summarized as follows:

  • Aggregating several sources of information in a CNN-based framework for feature extraction and market prediction. Since financial markets behavior is affected by many factors, it is important to gather related information as much as possible. Our initial feature set covers different aspects of stock related sources of data pretty well and basically, it can be easily extended to cover other possible sources of information.

  • To our knowledge, this is the first work suggesting a CNN which takes a 3-dimensional tensor aggregating and aligning a diverse set of features as input and is trained to extract features useful for predicting each of the pertinent stock markets afterward.

The rest of this paper is organized as follows: In section 2, related works and researches are presented. Then, in section 3, we introduce a brief background on related techniques in the domain. In section 4, the proposed method is presented in details followed by introduction of various utilized features in section 5. Our experimental setting and results are reported in section 6. In section 7 we discuss the results and there is a conclusion in section 8.

2 Related works

Different methods in stock prediction domain can be categorized into two groups. The first class includes algorithms try to improve the performance of prediction by enhancing the prediction models, while the second class of algorithms focuses on improving the features based on which the prediction is made.

In the first class of the algorithms that focus on the prediction models, a variety of tools have been used, including Artificial Neural Networks (ANN), naive Bayes, SVM and random forests. The most popular tool for financial prediction seems to be ANN

(Krollner et al., 2010). In (Kara et al., 2011)

, a comparison between performance of ANN and SVM were done. Ten technical indicators were passed to these two classifiers in order to forecast directional movement of the Istanbul Stock Exchange (ISE) National 100 Index. Authors found that ANN’s ability in prediction is significantly better than SVM.

Feedforward ANNs are popular types of ANNs that are capable of predicting both price movement direction and price value. Usually shallow ANNs are trained by back-propagation algorithm (Hecht-Nielsen, 1992; Hagan & Menhaj, 1994). While obstacles like the noisy behavior of stock markets and complexity of feature space make ANNs’ learning process to converge to suboptimal solutions, sometimes local search algorithms like GA or SA take responsibility of finding initial or final optimal weights for neural networks that are then used for prediction (Kim & Han, 2000; Qiu et al., 2016; Qiu & Song, 2016). In (Qiu et al., 2016), authors used genetic algorithm and simulated annealing to find initial weights of an ANN, and then back-propagation algorithm is used to train the network. This hybrid approach outperformed the standard ANN-based methods in prediction of Nikkei 225 index return. With slight modifications in (Qiu & Song, 2016), genetic algorithm was successfully utilized to find optimized weights of an ANN in which technical indicators were utilized to predict the direction of Nikkei 225 index movement.

Authors of (Zhong & Enke, 2017) have applied PCA and two variations of it in order to extract better features. A collection of different features was used as input data while an ANN was used for prediction of S&P 500. The results showed an improvement of the prediction using the features generated by PCA compared to the other two variations of that. The reported accuracy of predictions varies from 56% to 59% for different number of components used in PCA. Another study on the effect of features on the performance of prediction models has been reported in (Patel et al., 2015). This research uses common tools including ANN, SVM, random forest and naive Bayes for predicting directional movement of famous indices and stocks in Indian stock market. This research showed that mapping the data from a space of ten technical features to another feature space that represents trends of those features can lead to an improvement in the performance of the prediction.

According to mentioned researches and similar works, when a shallow model is used for prediction, the quality of features by which the input data is represented has a critical role in the performance of the prediction. The simplicity of shallow models can avoid them from achieving effective mappings from input space to successful predictions. So, with regards to availability of large amounts of data and emerging effective learning methods for training deep models, researchers have recently turned to such approaches for market prediction. An important aspect of deep models is that they are usually able to extract rich sets of features from the raw data and make predictions based on that. So, from this point of view, deep models usually combine both phases of feature extraction and prediction in a single phase.

Deep ANNs, that are basically neural networks with more than one hidden layers, are among the first deep methods used in the domain. In (Moghaddam et al., 2016)

, authors predicted NASDAQ prices based on the historical price of four and nine days ago. ANNs with different structures, including both deep and shallow ones, were examined in order to find appropriate number of hidden layers and neurons inside them. The experiments proved the superiority of deep ANNs over shallow ones. In

(Arévalo et al., 2016), authors used a deep ANN with five hidden layers to forecast Apple Inc’s stock price during the financial crisis. For each minute of trading three features were extracted from the fluctuation of price inside that time period. Outputs showed up to about 65% directional accuracy. In (Yong et al., 2017), an ANN with three hidden layers was utilized to predict the index price of Singapore’s stock market. Historical prices of the last ten days were fed to a deep ANN in order to predict the future price of next one to five days. This experiment reported that the highest performance was achieved for one day ahead prediction with MAPE of 0.75.

In (Chong et al., 2017), authors draw an analogy between different data representation methods including RBM, Auto-encoder and PCA applied on raw data with 380 features. The resulting representations were then fed to a deep ANN for prediction. The results showed that none of the data representation methods has superiority over the others in all of the tested experiments.

Recurrent Neural Networks are a kind of neural networks that are specially designed to have internal memory that enables them to extract historical features and make predictions based on them. So, they seem fit for the domains like market prediction in which historical behavior of markets has an important role in prediction. LSTM is one of the most popular kinds of RNNs. In (Nelson et al., 2017), technical indicators were fed to an LSTM in order to predict the direction of stock prices in the Brazilian stock market. According to the reported results, LSTM outperformed MLP, by achieving an accuracy of 55.9%.

Convolutional Neural Network is another deep learning algorithm applied in stock market prediction after MLP and LSTM while its ability to extract efficient features has been proven in many other domains as well. In (Di Persio & Honchar, 2016), CNN, LSTM and MLP were applied to the historical data of close prices of S&P 500 index. Results showed that CNN outperformed LSTM and MLP with accuracy of 53.6% while LSTM and MLP had accuracy of 52.2% and 52.1% respectively.

Based on some reported experiments, the way the input data is designed to be fed and processed by CNN has an important role in the quality of the extracted feature set and the final prediction. For example, CNN was used in (Gunduz et al., 2017) in which data of 10 days of 100 companies in Borsa Istanbul were utilized to produce technical indicators and time-lagged features. Then, a CNN was applied to improve the feature set. The reported comparison between CNN and logistic regression shows almost no difference between two methods. In another attempt to improve the prediction, features were clustered into different groups and similar features were put beside each other. The experiments showed that this preprocessing step has improved the performance of CNN to achieve F-measure of 56%.

Table 1 summarizes explained papers in terms of initial feature set, feature extraction algorithm and prediction method. As it can be seen there is a tendency toward deep learning models in recent publications, due to the capability of these algorithms in automatic feature extraction from raw data. However, most of the researchers have used only technical indicators or historical price data of one market for prediction while there are various sources of data which could enhance accuracy of prediction of stock market. In this paper, we are going to introduce a novel CNN-based framework that is designed to aggregate several sources of information in order to automatically extract features to predict direction of stock markets.

Author/year Target Data Feature Set
(Kara et al., 2011)
Borsa Istanbul
BIST 100 Index
technical indicator
(Patel et al., 2015)
4 Indian stocks
& indices
technical indicator
(Qiu et al., 2016)
Nikkei 225
financial indicator
macroeconomic data
(Qiu & Song, 2016)
Nikkei 225
technical indicator ANN GA+ANN
(Nelson et al., 2017)
Brazil Bovespa
5 stocks
technical indicator LSTM LSTM
(Di Persio & Honchar, 2016)
S&P 500 index
price data
(Moghaddam et al., 2016)
price data ANN-DNN ANN-DNN
(Arévalo et al., 2016)
3 extracted features DNN DNN
(Zhong & Enke, 2017)
S&P 500 index
various sources
of data
(Yong et al., 2017)
Singapore STI
price data DNN DNN
(Chong et al., 2017)
38 stock returns
price data
(Gunduz et al., 2017)
Borsa Istanbul
BIST 100 stocks
technical indicator
temporal feature
Our method
U.S. 5
major indices
various sources
of data
3D representation
of data+CNN

Table 1: Summary of explained papers

3 Background

Before presenting our suggested approach, in this section, we review the convolutional neural network that is the main element of our framework.

3.1 Convolutional Neural Network

LeCun and his colleagues introduced convolutional neural networks in 1995 (LeCun et al., 1995; Gardner & Dorling, 1998). CNN has many layers which could be categorized into input layer, convolutional layers, pooling layers, fully connected layers and output layer.

3.1.1 Convolutional layer

The convolutional layer is supposed to do the convolution operation on the data. In fact, input could be considered as a function, filter applied to that is another function and convolution operation is an algorithm used to measure changes caused by applying filter on the input. Size of a filter shows the coverage of that filter. Each filter utilizes a shared set of weights to perform the convolutional operation. Weights are updated during the process of training.

Let’s posit input of layer is an matrix and convolutional filters are used. Then, input of layer is calculated according to Eq 1. Fig 1 shows applying filter to the input data in order to get value of

in the next layer. Usually, output of each filter is passed through an activation function before entering the next layer. Relu (Eq

2) is a commonly used nonlinear activation function.


Figure 1: Applying filter() to the input data() in order to get value of in the next layer

In the Eq 1, is the value at row , column of layer , is the weight at row , column of filter and is the activation function.


3.1.2 Pooling layer

Pooling layer is responsible for subsampling the data. This operation, not only reduces the computational cost of the learning process, but also it is a way for handling the overfitting problem in CNN. Overfitting is a situation that arises when a trained model makes too fit to the training data, such that it cannot generalize to the future unseen data. It has a connection to the number of parameters that are learned and the amount of data that the prediction model is learned from. Deep models, including CNNs, usually have many parameters so they are prone to overfitting more than shallow models. Some methods have been suggested to avoid overfitting. Using pooling layers in CNNs can help to reduce the risk of overfitting. All the values inside a pooling window are converted to only one value. This transformation reduces the size of the input of the following layers, and hence, reduces the number of the parameters that must be learned by the model, that in turn, lowers the risk of overfitting. Max pooling is the most common type of pooling in which the maximum value in a certain window is chosen.

3.1.3 Fully connected layer

At the final layer of a CNN, there is an MLP network which is called its fully connected layer. It is responsible for converting extracted features in the previous layers to the final output. The relation between two successive layers is defined by Eq 3


In Eq 3, is the value of neuron at the layer , is activation function and weight of connection between neuron from layer and neuron from layer are shown by .

3.2 Dropout

In addition to pooling, we have also used another technique called dropout that was first developed for training deep neural networks. The idea behind the dropout technique is to avoid the model from learning too much from the training data. So, in each learning cycle during the training phase, each neuron has a chance equal to some dropout rate, to not be trained in that cycle. This avoids the model from being too flexible, and so, helps the learning algorithm to converge to a model that is not too much fit to the training data, and instead, can be generalized well for prediction the unlabeled future data (Hinton et al., 2012; Srivastava et al., 2014).

4 Proposed CNN: CNNpred

CNN has many parameters including the number of layers, number of filters in each layer, dropout rate, size of filters in each layer, initial representation of input data and so on which should be chosen wisely to get the desired outcomes. Although and filters are quite common in image processing domain, we think that size of each filter should be determined according to financial interpretation of features and their characteristics rather than just following previous works in image processing. Here we introduce the architecture of CNNPred, a general CNN-based framework for stock market prediction. CNNPred has two variations that are referred to as 2D-CNNpred and 3D-CNNpred. We explain the framework in four major steps: representation of input data, daily feature extraction, durational feature extraction and final prediction.

Representation of input data: CNNpred takes information from different markets and uses it to predict the future of those markets. As we mentioned 2D-CNNpred and 3D-CNNpred take different approaches for constructing prediction models. The goal of the first approach is to find a general model for mapping the history of a market to its future fluctuations and by ”general model” we mean a model that is valid for several markets. In other words, we assume that the true mapping function from the history to the future is the one that is correct for many markets. For this goal, we need to design a single model that is able to predict the future of a market based on its own history, however, to extract the desired mapping function, that model needs to be trained by samples from different markets. 2D-CNNpred follows this general approach, but in addition to modeling the history of a market as the input data, it also uses a variety of other sources of information as well. In 2D-CNN-pred all this information is aggregated and fed to a specially designed CNN as a two-dimensional tensor, and that’s why it is called 2D-CNNpred. On the other hand, the second approach, 3D-CNNpred, assumes that different models are needed for making predictions in different markets, but each prediction model can use information from the history of many markets. In other words, 3D-CNNpred, unlike 2D-CNNpred, does not train a single prediction model that can predict the future of each market given its own historical data, but instead, it extracts features from the historical information of many markets and uses them to train a separate prediction model for each market. The intuition behind this approach is that the mechanisms that dictate the future behavior of each market differs, at least slightly, from other markets. However, what happens in the future in a market, may depend on what happens inside and outside that certain market. Based on this intuition, 3D-CNNpred uses a tensor with three dimensions, to aggregate historical information from various markets and feed it to a specially designed CNN to train a prediction model for each market. Although the structure of the model is the same for all the markets, the data that is used for training is different for each market. In other words, in 3D-CNNpred, each prediction model can see all the available information as input, but is trained to predict the future of a certain market based on that input. One can expect that 3D-CNNpred, unlike 2D-CNNpred, will be able to combine information from different markets into high-level features before making predictions. Fig 2 shows a schema of how data is represented and used in CNNpred’s variations.

Figure 2: The structure of input data in two variations of CNNpred

Daily feature extraction: Each day in the historical data is represented by a series of features like opening and closing prices. The traditional approach to market prediction is to analyze these features for example in the form of candlesticks, probably by constructing higher level features based on them, in order to predict the future behavior of the market. The idea behind the design of first layer of CNNpred comes from this observation. In the first step of both variations of CNNpred, there is a convolutional layer whose task is to combine the daily features into higher level features for representing each single day of the history.

Durational feature extraction: Some other useful information for predicting the future behavior of a market comes from studying the behavior of the market over time. Such a study can give us information about the trends that appear in the market’s behavior, and find patterns that can predict the future based on them. So it is important to combine features of consecutive days of data to gather high-level features representing trends or reflecting the market’s behavior in certain time intervals. Both 2D-CNNpred and 3D-CNNpred data have layers that are supposed to combine extracted features in the first layer and produce even more sophisticated features summarizing the data in some certain time interval.

Final prediction: At the final step, the features that are generated in previous layers are converted to a one-dimensional vector using a flattening operation and this vector is fed to a fully connected layer that maps the features to a prediction.

In the next two sections, we will explain the general design of 2D-CNNpred and 3D-CNNpred as well as how they have been adopted for the data set that we have used in the specific experiments performed in this paper. In our experiments, we have used data from 5 different indices. Each index has 82 features that means each day of the history of a market is represented by 82 features. The 82 gathered features are selected in a way that form a complete feature set and consist of economic data, technical indicators, big U.S. companies, commodities, exchange rate of currencies, future contracts and world’s stock indices. The length of the history is 60 days that is for each prediction, the model can use information from 60 last days.

4.1 2D-CNNpred

Representation of input data: As we mentioned before, the input to the 2D-CNNpred is a two-dimensional matrix. The size of the matrix depends on the number of features that represent each day, as well as the number of days back into the history, that is used for making a prediction. If the input used for prediction consists of days each represented by features then the size of input tensor will be .

Daily feature extraction: To extract daily features in 2D-CNNpred, number of initial features filters are utilized. Each of those filters covers all the daily features and can combine them into a single higher level feature, so using this layer, 2D-CNNpred can construct different combinations of primary features. It is also possible for the network to drop useless features by setting their corresponding weights in filters equal to zero. So this layer works as an initial feature extraction/feature selection module. Fig 3 represents application of a simple filter on the input data.

Figure 3: Applying a number of features filter to 2D input tensor.

Durational feature extraction: While the first layer of 2D-CNNpred extracts features out of primary daily features, the following layers combine extracted features of different days to construct higher level features for aggregating the available information in certain durations. Like the first layer, these succeeding layers use filters for combining lower level features from their input to higher level ones. 2D-CNNpred uses filters in the second layer. Each of those filters covers three consecutive days, a setting that is inspired by the observation that most of the famous candlestick patterns like Three Line Strike and Three Black Crows, try to find meaningful patterns in three consecutive days (Nison, 1994; Bulkowski, 2012; Achelis, 2001). We take this as a sign of the potentially useful information that can be extracted from a time window of three consecutive times unites in the historical data. The third layer is a pooling layer that performs a max pooling, that is a very common setting for the pooling layers. After this pooling layer and in order to aggregate the information in longer time intervals and construct even more complex features, 2D-CNNpred uses another convolutional layer with filters followed by a second pooling layer just like the first one.

Final prediction: Produced features generated by the last pooling layer are flattened into a final feature vector. This feature vector is then converted to a final prediction through a fully connected layer. Sigmoid (Eq 4) is the activation function that we choose for this layer. Since the output of sigmoid is a number in [0-1] interval, the prediction that is made by 2D-CNNpred for a market can be interpreted as the probability of an increase in the price of that market for the next day, that is a valuable piece of information. Clearly, it is rational to put more money on a stock that has a higher probability of going up. On the other hand, stocks with a low probability of going up are good candidates for short selling. However, in our experiments, we discretize the output to either 0 or 1, whichever is closer to the prediction.


A sample configuration of 2D-CNNpred: As we mentioned before the input we used for each prediction consists of 60 days each represented by 82 features. So, the input to the 2D-CNNpred is a matrix of 60 by 82. The first convolutional layer uses eight filters after which there are two convolutional layers with eight filters, each followed by a layer of max-pooling. The final flattened feature vector contains 104 features that are fed to the fully connected layer to produce the final output. Fig 4 shows a graphical visualization of described process.

Figure 4: Graphical Visualization of 2D-CNNpred

4.2 3D-CNNpred

Representation of input data: 3D-CNNpred, unlike 2D-CNNpred, uses a three-dimensional tensor to represent data. The reason is that each sample that is fed to 3D-CNNpred, contains information from several markets. So, the initial daily features, the days of the historical record and the markets from which the data is gathered form the three dimensions of the input tensor. Suppose our dataset consists different markets, features for each of these markets and our goal is to predict day based on past days. Fig 5 shows how one sample of the data would be represented.

Figure 5: Representation of input data in 3D-CNNpred based on primary features, related markets and days before the day of prediction

Daily feature extraction: The first layer of filters in 3D-CNNpred is defined as a set of convolutional filters, while the primary features are represented along the depth of the tensor. Fig 6 shows how a filter works. This layer of filters is responsible for combining subsets of basic features that are available through the depth of the input tensor into a set of higher level features. The input tensor is transformed by this layer into another tensor whose width and height is the same but its depth is equal to the number of convolutional filters of layer one. Same as 2D-CNNpred, the network has the capability to act as a feature selection/extraction algorithm.

Figure 6: Applying a filter to the first part of the 3D input tensor.

Durational feature extraction: In addition to daily features, 3D-CNNpred’s input data provides information about other markets. Like 2D-CNNpred, the next four layers are dedicated to extracting higher level features that summarize the fluctuation patterns of the data in time. However, in 3D-CNNpred, this is done over a series of markets instead of one. So, the width of the filters in the second convolutional layer is defined in a way that covers all the pertinent markets. Same as 2D-CNNpred and motivated by the same mentioned reason, the height of filters is selected to be 3 so as to cover three consecutive time units. Using this setting, the size of filters in the second convolutional layer is number of markets. The next three layers, like those of 2D-CNNpred, are defined as a max pooling layer, another convolutional layer followed by a final max pooling layer.

Final prediction: Same as 2D-CNNpred, here in 3D-CNNpred the output of the durational feature extraction phase is flattened and used to produce the final results.

A sample configuration of 3D-CNNpred: In our experiments, the input to the 3D-CNNpred is a matrix of 60 by 5 with depth of 82. The first convolutional layer uses eight filters to perform convolutional operation, after which there is one convolutional layer with eight filters followed by max pooling layer. Then, another convolutional layer utilizes eight filters, again followed by a max-pooling layer generate the final 104 features. In the end, a fully connected layer converts 104 neurons to 1 neuron and produces the final output. Fig 7 shows a graphical visualization of the process.

Figure 7: Graphical Visualization of 3D-CNNpred

5 Initial feature set for each market

As we mentioned before, our goal is to develop a model for prediction of the direction of movements of stock market prices or indices. We applied our approach to predict the movement of indices of S&P 500, NASDAQ, Dow Jones Industrial Average, NYSE and RUSSELL market. For this prediction task, we use 82 features for representing each day of each market. Some of these features are market-specific while the rest are general economic features and are replicated for every market in the data set. This rich set of features could be categorized in eight different groups that are primitive features, technical indicators, economic data, world stock market indices, the exchange rate of U.S. dollar to the other currencies, commodities, data from big companies of U.S. market and future contracts. We briefly explain different groups of our feature set here and more details about them can be found in Appendix I.

  • Primitive features: Close price and which day of week prediction is supposed to happen are primitive features used in this work.

  • Technical indicators: Technical analysts use technical indicators which come out of historical data of stocks like price, volume and so on to analyze short-term movement of prices. They are quite common in stock market research. The moving averages are examples of this type of features.

  • Economic data: Economic data reflects whether the economy of a country is doing well or not. In addition to the other effective factors, investors usually take a look at these indicators so as to gain insight into future of stock market. Information coming from Treasury bill belongs to this category.

  • World stock markets: Usually, stock markets all over the world have interaction with each other because of the phenomenon of globalization of economy. This connection would be more appreciated when we consider time difference in various countries which makes it possible to gain information about future of a countrys market by monitoring other countries markets. For instance, effect of other countries stock market like China, Japan and South Korea on U.S. market.

  • The exchange rate of U.S. dollar: There are companies that import their needs from other countries or export their product to other countries. In these cases, value of U.S. dollar to other currencies like Canadian dollar and European Euro make an important role in the fluctuation of stock prices and by extent, the whole market.

  • Commodities: Another source of information that affects stock market is price of commodities like gold, silver, oil and so on. This kind of information can reflect a view of the global market. This means that the information about the prices of commodities can be useful in prediction of the fluctuations of stock prices.

  • Big U.S. Companies: Stock market indices are calculated based on different stocks. Each stock carries a weight in this calculation that matches its share in the market. In another word, big companies are more important than small ones in prediction of stock market indices. Examples of that could be Exxon Mobil Corporation and Apple Inc.

  • Futures contracts: Futures contracts are contracts in which one side of agreement is supposed to deliver stock, commodities and so on in the future. These contracts show expected value of the merchandise in the future. Investors tend to buy stocks that have higher expected value than their current value. For instance, S&P 500 Futures, DJI Futures and NASDAQ Futures prices could affect current price of S&P 500 and other indices.

6 Experimental settings and results

In this section, we describe the settings that are used to evaluate the models, including datasets, parameters of the networks, evaluation methodology and baseline algorithms. Then, the evaluation results are reported.

6.1 Data gathering and preparation

The datasets used in this work include daily direction of close of S&P 500 index, NASDAQ Composite, Dow Jones Industrial Average, NYSE Composite and RUSSELL 2000. Table 2 shows more information about them. Each sample has 82 features that already have been explained and its assigned label is determined according to the Eq 5. It is worth mentioning that for each index only technical indicators and primitive features are unique and the other features, like big U.S. companies or price of commodities, are common between different indices.

Name Description
S&P 500 Index of 505 companies exist in S&P stock market
Dow Jones Industrial Average Index of 30 major U.S. companies
NASDAQ Composite Index of common companies exist in NASDAQ stock market
NYSE Composite Index of common companies exist in New York Stock Exchange
RUSSEL 2000 Index of 2000 small companies in U.S.
Table 2: Description of used indices

Where refers to the closing price at day t.

This data are from the period of Jan 2010 to Nov 2017. The first 60% of the data is used for training the models, the next 20% forms the validation data and the last 20% is the test data.

Different features could have various ranges. It is usually confusing for learning algorithms to handle features with different ranges. Generally, the goal of data normalization here is to map the values of all features to a single common range, and it usually improves the performance of the prediction model. We use Eq 6 for normalizing the input data, where is normalized feature vector, is the original feature vector, and

are the mean and the standard deviation of original feature.


6.2 Evaluation methodology

Evaluation metrics are needed to compare results of our method with the other methods. Accuracy is one of the common metrics have been used in this area. However, in an imbalanced dataset, it may be biased toward the models that tend to predict the more frequent class. To address this issue, we report the Macro-Averaged-F-Measure that is the mean of F-measures calculated for each of the two classes (Gunduz et al., 2017; Özgür et al., 2005).

6.3 Parameters of network

Numerous deep learning packages and software have been developed. In this work, Keras

(Chollet et al., 2015) was utilized to implement CNN. The activation function of all the layers except the last one is RELU. Complete descriptions of parameters of CNN are listed in Table 3.

Parameter Value
Filter size {8, 8, 8}
Activation function RELU-Sigmoid
Optimizer Adam
Dropout rate 0.1
Batch size 128
Table 3: Parameters of CNN

6.4 Baseline algorithms

We compare the performance of the suggested methods with that of the algorithms applied in the following researches. In all the base-line algorithms the same settings reported in the original paper were used.

  • The first baseline algorithm is the one reported in (Zhong & Enke, 2017). In this algorithm, the initial data is mapped to a new feature space using PCA and then the resulting representation of the data is used for training a shallow ANN for making predictions.

  • The second baseline is based on the method suggested in (Kara et al., 2011), in which the technical indicators reported in Table 4 are used to train a shallow ANN for prediction.

  • The third baseline algorithm is a CNN with two-dimensional input (Gunduz et al., 2017). First, the features are clustered and reordered accordingly. The resulting representation of the data is then used by a CNN with a certain structure for prediction.

Indicator Description
MA Simple Moving Average
EMA Exponential Moving Average
MOM Momentum
%K Stochastic %K
%D Stochastic %D
RSI Relative Strength Index
MACD Moving Average Convergence Divergence
%R Larry William’s %R
AD (AccumulationDistribution) Oscillator
CCI Commodity Channel Index
Table 4: Technical Indicators

6.5 Results

In this section, results of five different experiments are explained. Since one of the baseline algorithms uses PCA for dimension reduction, the performance of the algorithm with different number of principal components is tested. In order to make the situation equal for the other baseline algorithms, these algorithms are tested several times with the same condition. Then, average F-measure of the algorithms are compared. More details about used notations are in Table 5.

Algorithm Explanation
3D-CNNpred Our method
2D-CNNpred Our method
PCA+ANN (Zhong & Enke, 2017) PCA as dimension reduction and ANN as classifier
Technical (Kara et al., 2011) Technical indicators and ANN as classifier
CNN-cor (Gunduz et al., 2017) A CNN with mentioned structure in the paper
Table 5: Description of used algorithms

Tables [6-10] summarize the results for the baseline algorithms as well as our suggested models on S&P 500 index, Dow Jones Industrial Average, NASDAQ Composite, NYSE Composite and RUSSELL 2000 historical data. Each table consists of different statistical information about a specific market. The results include average of F-measure, as well as the best F-measure and the standard deviation of F-measures for the predictions in different runs. Standard deviation of produced F-measures demonstrates how much generated results of a model vacillates over their mean. Models with lower standard deviation are more robust. Also P-values against 2D-CNNpred and 3D-CNNpred are also reported to show whether the differences are significant or not.

Measure \Model Technical CNN-cor PCA+ANN 2D-CNNpred 3D-CNNpred
Mean of
0.4469 0.3928 0.4237 0.4799 0.4837
Best of
0.5627 0.5723 0.5165 0.5504 0.5532
Standard deviation
of F-measure
0.0658 0.1017 0.0596 0.0273 0.0343
P-value against
0.0056 less than 0.0001 less than 0.0001 1 0.5903
P-value against
0.003 less than 0.0001 less than 0.0001 0.5903 1
Table 6: Statistical information of S&P 500 index using different algorithms
Measure \Model Technical CNN-cor PCA+ANN 2D-CNNpred 3D-CNNpred
Mean of
0.415 0.39 0.4283 0.4822 0.4925
Best of
0.5518 0.5253 0.5392 0.5678 0.5778
Standard deviation
of F-measure
0.0625 0.0939 0.064 0.0321 0.0347
P-value against
less than 0.0001 less than 0.0001 less than 0.0001 1 0.1794
P-value against
less than 0.0001 less than 0.0001 less than 0.0001 0.1794 1
Table 7: Statistical information of Dow Jones Industrial Average index using different algorithms
Measure \Model Technical CNN-cor PCA+ANN 2D-CNNpred 3D-CNNpred
Mean of
0.4199 0.3796 0.4136 0.4779 0.4931
Best of
0.5487 0.5498 0.5312 0.5219 0.5576
Standard deviation
of F-measure
0.0719 0.1114 0.0553 0.0255 0.0405
P-value against
less than 0.0001 less than 0.0001 less than 0.0001 1 0.0509
P-value against
less than 0.0001 less than 0.0001 less than 0.0001 0.0509 1
Table 8: Statistical information of NASDAQ Composite index using different algorithms
Measure \Model Technical CNN-cor PCA+ANN 2D-CNNpred 3D-CNNpred
Mean of
0.4071 0.3906 0.426 0.4757 0.4751
Best of
0.5251 0.5376 0.5306 0.5316 0.5592
Standard deviation
of F-measure
0.0556 0.0926 0.059 0.0314 0.0384
P-value against
less than 0.0001 less than 0.0001 less than 0.0001 1 0.9366
P-value against
less than 0.0001 less than 0.0001 less than 0.0001 0.9366 1
Table 9: Statistical information of NYSE Composite using different algorithms
Measure \Model Technical CNN-cor PCA+ANN 2D-CNNpred 3D-CNNpred
Mean of
0.4525 0.3924 0.4279 0.4775 0.4846
Best of
0.5665 0.5602 0.5438 0.532 0.5787
Standard deviation
of F-measure
0.0655 0.0977 0.066 0.0271 0.0371
P-value against
0.0327 less than 0.0001 0.0001 1 0.3364
P-value against
0.01 less than 0.0001 less than 0.0001 0.3364 1
Table 10: Statistical information of RUSSELL 2000 using different algorithms

To summarize and compare the performance of different algorithms, average results of them in 5 market indices are shown in Fig 8.

Figure 8: Mean F-measure of different algorithms in different markets

7 Discussion

It is obvious from the results that both 2D-CNNpred and 3D-CNNpred statistically outperformed the other baseline algorithms. The difference between F-measure of our model and baseline algorithm which uses only ten technical indicators is obvious. A plausible reason for that could be related to the insufficiency of technical indicators for prediction as well as using a shallow ANN instead of our deep prediction model. Even, more initial features and incorporation of PCA which is a famous feature extraction algorithm did not improve the results as it was expected. A drawback of these two approaches may be the fact that they use shallow ANN that has only one hidden layer and its ability in feature extraction is limited. It demonstrates that adding more basic features is not enough by itself without improving the feature extraction algorithm. Our framework has two advantages over these two baseline algorithm that have led to its superiority in performance: First, it uses a rich set of feature containing useful information for stock prediction. Second, it uses a deep learning algorithm that extracts sophisticated features out of primary ones.

The next baseline algorithm was CNN-Cor which had the worst results among all the tested algorithms. CNN’s ability in feature extraction is highly dependent on wisely selection of its parameters in a way that fits the problem for which it is supposed to be applied. With regards to the fact that both 2D-CNNpred and CNN-Cor used the same feature set and they were trained almost in the same way, poor results of CNN-Cor compared to 2D-CNNpred is possibly the result of the design of the 2D-CNN. Generally, the idea of using and filters seems skeptical. The fact that these kinds of filters are popular in computer vision does not guarantee that they would work well in stock market prediction as well. In fact, prediction with about 9% lower F-measure on average in comparison to the 2D-CNNpred showed that designing the structure of CNN is the core challenge in applying CNNs for stock market prediction. A poorly designed CNN can adversely influence the results and make CNN’s performance even worse than a shallow ANN.

Finally, an advantage of 3D-CNNpred over 2D-CNNpred which could be the reason for slightly better performance of 3D-CNN is that the latter one can combine information from different markets into a high-level feature while 2D-CNNpred has access to one market’s initial features in its feature extraction phase.

8 Conclusion

The noisy and nonlinear behavior of prices in financial markets makes prediction in those markets a difficult task. A better prediction can be gained by having better features. In this paper, we tried to use a wide collection of information, including historical data from the target market, general economic data and information from other possibly correlated stock markets. Also, two variations of a deep CNN-based framework were introduced and applied to extract higher-level features from that rich set of initial features.

The suggested framework, CNNpred, was tested to make predictions in S&P 500, NASDAQ, DJI, NYSE, and RUSSELL. Final results showed the significant superiority of two versions of CNNPred over the state of the art baseline algorithms. CNNpred was able to improve the performance of prediction in all the five indices over the baseline algorithms by about 3% to 11%, in terms of F-measure. In addition to confirming the usefulness of the suggested approach, these observations also suggest that designing the structures of CNNs for the stock prediction problems is possibly a core challenge that deserves to be further studied.

Appendix I. Description of features

The list of features from different categories used as initial feature set representing each sample:

# Feature Description Type Source / Calculation
1 Day which day of week Primitive Pandas
2 Close Close price Primitive Yahoo Finance
3 Vol Relative change of volume Technical Indicator TA-Lib
4 MOM-1 Return of 2 days before Technical Indicator TA-Lib
5 MOM-2 Return of 3 days before Technical Indicator TA-Lib
6 MOM-3 Return of 4 days before Technical Indicator TA-Lib
7 ROC-5 5 days Rate of Change Technical Indicator TA-Lib
8 ROC-10 10 days Rate of Change Technical Indicator TA-Lib
9 ROC-15 15 days Rate of Change Technical Indicator TA-Lib
10 ROC-20 20 days Rate of Change Technical Indicator TA-Lib
11 EMA-10 10 days Exponential Moving Average Technical Indicator TA-Lib
12 EMA-20 20 days Exponential Moving Average Technical Indicator TA-Lib
13 EMA-50 50 days Exponential Moving Average Technical Indicator TA-Lib
14 EMA-200 200 days Exponential Moving Average Technical Indicator TA-Lib
15 DTB4WK 4-Week Treasury Bill: Secondary Market Rate Economic FRBSL
16 DTB3 3-Month Treasury Bill: Secondary Market Rate Economic FRBSL
17 DTB6 6-Month Treasury Bill: Secondary Market Rate Economic FRBSL
18 DGS5 5-Year Treasury Constant Maturity Rate Economic FRBSL
19 DGS10 10-Year Treasury Constant Maturity Rate Economic FRBSL
20 DAAA Moody’s Seasoned Aaa Corporate Bond Yield Economic FRBSL
21 DBAA Moody’s Seasoned Baa Corporate Bond Yield Economic FRBSL
22 TE1 DGS10-DTB4WK Economic FRBSL
23 TE2 DGS10-DTB3 Economic FRBSL
24 TE3 DGS10-DTB6 Economic FRBSL
25 TE5 DTB3-DTB4WK Economic FRBSL
26 TE6 DTB6-DTB4WK Economic FRBSL
28 DE2 DBAA-DGS10 Economic FRBSL
29 DE4 DBAA-DTB6 Economic FRBSL
30 DE5 DBAA-DTB3 Economic FRBSL
32 CTB3M
Change in the market yield on U.S. Treasury securities at
3-month constant maturity, quoted on investment basis
Economic FRBSL
33 CTB6M
Change in the market yield on U.S. Treasury securities at
6-month constant maturity, quoted on investment basis
Economic FRBSL
34 CTB1Y
Change in the market yield on U.S. Treasury securities at
1-year constant maturity, quoted on investment basis
Economic FRBSL
35 Oil Relative change of oil price(WTI), Oklahoma Commodity FRBSL
36 Oil Relative change of oil price(Brent) Commodity
37 Oil Relative change of oil price(WTI) Commodity
38 Gold Relative change of gold price (London market) Commodity FRBSL
39 Gold-F Relative change of gold price futures Commodity
40 XAU-USD Relative change of gold spot U.S. dollar Commodity
41 XAG-USD Relative change of silver spot U.S. dollar Commodity
42 Gas Relative change of gas price Commodity
43 Silver Relative change of silver price Commodity
44 Copper Relative change of copper future Commodity
45 IXIC Return of NASDAQ Composite index World Indices Yahoo Finance
46 GSPC Return of S&P 500 index World Indices Yahoo Finance
47 DJI Return of Dow Jones Industrial Average World Indices Yahoo Finance
48 NYSE Return of NY stock exchange index World Indices Yahoo Finance
49 RUSSELL Return of RUSSELL 2000 index World Indices Yahoo Finance
50 HSI Return of Hang Seng index World Indices Yahoo Finance
51 SSE Return of Shang Hai Stock Exchange Composite index World Indices Yahoo Finance
52 FCHI Return of CAC 40 World Indices Yahoo Finance
53 FTSE Return of FTSE 100 World Indices Yahoo Finance
54 GDAXI Return of DAX World Indices Yahoo Finance
55 USD-Y Relative change in US dollar to Japanese yen exchange rate Exchange Rate Yahoo Finance
56 USD-GBP Relative change in US dollar to British pound exchange rate Exchange Rate Yahoo Finance
57 USD-CAD Relative change in US dollar to Canadian dollar exchange rate Exchange Rate Yahoo Finance
58 USD-CNY Relative change in US dollar to Chinese yuan exchange rate Exchange Rate Yahoo Finance
59 USD-AUD Relative change in US dollar to Australian dollar exchange rate Exchange Rate
60 USD-NZD Relative change in US dollar to New Zealand dollar exchange rate Exchange Rate
61 USD-CHF Relative change in US dollar to Swiss franc exchange rate Exchange Rate
62 USD-EUR Relative change in US dollar to Euro exchange rate Exchange Rate
63 USDX Relative change in US dollar index Exchange Rate
64 XOM Return of Exon Mobil Corporation U.S. Companies Yahoo Finance
65 JPM Return of JPMorgan Chase & Co. U.S. Companies Yahoo Finance
66 AAPL Return of Apple Inc. U.S. Companies Yahoo Finance
67 MSFT Return of Microsoft Corporation U.S. Companies Yahoo Finance
68 GE Return of General Electric Company U.S. Companies Yahoo Finance
69 JNJ Return of Johnson & Johnson U.S. Companies Yahoo Finance
70 WFC Return of Wells Fargo & Company U.S. Companies Yahoo Finance
71 AMZN Return of Inc. U.S. Companies Yahoo Finance
72 FCHI-F Return of CAC 40 Futures Futures
73 FTSE-F Return of FTSE 100 Futures Futures
74 GDAXI-F Return of DAX Futures Futures
75 HSI-F Return of Hang Seng index Futures Futures
76 Nikkei-F Return of Nikkei index Futures Futures
77 KOSPI-F Return of Korean stock exchange Futures Futures
78 IXIC-F Return of NASDAQ Composite index Futures Futures
79 DJI-F Return of Dow Jones Industrial Average Futures Futures
80 S&P-F Return of S&P 500 index Futures Futures
81 RUSSELL-F Return of RUSSELL Futures Futures
82 USDX-F Relative change in US dollar index futures Exchange Rate
Table 11: Description of used indices



  • Achelis (2001) Achelis, S. B. (2001). Technical Analysis from A to Z. McGraw Hill New York.
  • Arévalo et al. (2016) Arévalo, A., Niño, J., Hernández, G., & Sandoval, J. (2016). High-frequency trading strategy based on deep neural networks. In International conference on intelligent computing (pp. 424–436). Springer.
  • Atsalakis & Valavanis (2009) Atsalakis, G. S., & Valavanis, K. P. (2009). Surveying stock market forecasting techniques–part ii: Soft computing methods. Expert Systems with Applications, 36, 5932–5941.
  • Ballings et al. (2015) Ballings, M., Van den Poel, D., Hespeels, N., & Gryp, R. (2015). Evaluating multiple classifiers for stock price direction prediction. Expert Systems with Applications, 42, 7046–7056.
  • Bao et al. (2017) Bao, W., Yue, J., & Rao, Y. (2017).

    A deep learning framework for financial time series using stacked autoencoders and long-short term memory.

    PloS one, 12, e0180944.
  • Brown et al. (2013) Brown, M. S., Pelosi, M. J., & Dirska, H. (2013). Dynamic-radius species-conserving genetic algorithm for the financial forecasting of dow jones index stocks. In

    International Workshop on Machine Learning and Data Mining in Pattern Recognition

    (pp. 27–41).
  • Bulkowski (2012) Bulkowski, T. N. (2012). Encyclopedia of candlestick charts volume 332. John Wiley & Sons.
  • Cai et al. (2012) Cai, X., Hu, S., & Lin, X. (2012). Feature extraction using restricted boltzmann machine for stock price prediction. In Computer Science and Automation Engineering (CSAE), 2012 IEEE International Conference on (pp. 80–83). IEEE volume 3.
  • Chen et al. (2015) Chen, K., Zhou, Y., & Dai, F. (2015). A lstm-based method for stock returns prediction: A case study of china stock market. In Big Data (Big Data), 2015 IEEE International Conference on (pp. 2823–2824). IEEE.
  • Chollet et al. (2015) Chollet, F. et al. (2015). Keras.
  • Chong et al. (2017) Chong, E., Han, C., & Park, F. C. (2017). Deep learning networks for stock market analysis and prediction: Methodology, data representations, and case studies. Expert Systems with Applications, 83, 187–205.
  • Di Persio & Honchar (2016) Di Persio, L., & Honchar, O. (2016). Artificial neural networks architectures for stock price prediction: Comparisons and applications. International Journal of Circuits, Systems and Signal Processing, 10, 403–413.
  • Fama (1970) Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. The journal of Finance, 25, 383–417.
  • Fischer & Krauss (2018) Fischer, T., & Krauss, C. (2018). Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research, 270, 654–669.
  • Gardner & Dorling (1998) Gardner, M. W., & Dorling, S. (1998). Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmospheric environment, 32, 2627–2636.
  • Gunduz et al. (2017) Gunduz, H., Yaslan, Y., & Cataltepe, Z. (2017). Intraday prediction of borsa istanbul using convolutional neural networks and feature correlations. Knowledge-Based Systems, 137, 138–148.
  • Guresen et al. (2011) Guresen, E., Kayakutlu, G., & Daim, T. U. (2011). Using artificial neural network models in stock market index prediction. Expert Systems with Applications, 38, 10389–10397.
  • Hagan & Menhaj (1994) Hagan, M. T., & Menhaj, M. B. (1994). Training feedforward networks with the marquardt algorithm. IEEE transactions on Neural Networks, 5, 989–993.
  • He et al. (2016) He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
  • Hecht-Nielsen (1992) Hecht-Nielsen, R. (1992).

    Theory of the backpropagation neural network.

    In Neural networks for perception (pp. 65–93). Elsevier.
  • Hinton et al. (2012) Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, .
  • Hu et al. (2015a) Hu, Y., Feng, B., Zhang, X., Ngai, E., & Liu, M. (2015a). Stock trading rule discovery with an evolutionary trend following model. Expert Systems with Applications, 42, 212–222.
  • Hu et al. (2015b) Hu, Y., Liu, K., Zhang, X., Su, L., Ngai, E., & Liu, M. (2015b). Application of evolutionary computation for rule discovery in stock algorithmic trading: A literature review. Applied Soft Computing, 36, 534–551.
  • Kara et al. (2011) Kara, Y., Boyacioglu, M. A., & Baykan, Ö. K. (2011). Predicting direction of stock price index movement using artificial neural networks and support vector machines: The sample of the istanbul stock exchange. Expert systems with Applications, 38, 5311–5319.
  • Khaidem et al. (2016) Khaidem, L., Saha, S., & Dey, S. R. (2016). Predicting the direction of stock market prices using random forest. arXiv preprint arXiv:1605.00003, .
  • Kim (2003) Kim, K.-j. (2003). Financial time series forecasting using support vector machines. Neurocomputing, 55, 307–319.
  • Kim & Han (2000) Kim, K.-j., & Han, I. (2000). Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index. Expert systems with Applications, 19, 125–132.
  • Krollner et al. (2010) Krollner, B., Vanstone, B., & Finnie, G. (2010). Financial time series forecasting with machine learning techniques: A survey, .
  • LeCun et al. (2015) LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521, 436.
  • LeCun et al. (1995) LeCun, Y., Bengio, Y. et al. (1995). Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361, 1995.
  • Lee et al. (2017) Lee, S., Enke, D., & Kim, Y. (2017). A relative value trading system based on a correlation and rough set analysis for the foreign exchange futures market.

    Engineering Applications of Artificial Intelligence

    , 61, 47–56.
  • Moghaddam et al. (2016) Moghaddam, A. H., Moghaddam, M. H., & Esfandyari, M. (2016). Stock market index prediction using artificial neural network. Journal of Economics, Finance and Administrative Science, 21, 89–93.
  • Nassirtoussi et al. (2015) Nassirtoussi, A. K., Aghabozorgi, S., Wah, T. Y., & Ngo, D. C. L. (2015). Text mining of news-headlines for forex market prediction: A multi-layer dimension reduction algorithm with semantics and sentiment. Expert Systems with Applications, 42, 306–324.
  • Nelson et al. (2017) Nelson, D. M., Pereira, A. C., & de Oliveira, R. A. (2017). Stock market’s price movement prediction with lstm neural networks. In Neural Networks (IJCNN), 2017 International Joint Conference on (pp. 1419–1426). IEEE.
  • Nison (1994) Nison, S. (1994). Beyond candlesticks: New Japanese charting techniques revealed volume 56. John Wiley & Sons.
  • Ou & Wang (2009) Ou, P., & Wang, H. (2009). Prediction of stock market index movement by ten data mining techniques. Modern Applied Science, 3, 28.
  • Özgür et al. (2005) Özgür, A., Özgür, L., & Güngör, T. (2005). Text categorization with class-based and corpus-based keyword selection. In International Symposium on Computer and Information Sciences (pp. 606–615). Springer.
  • Patel et al. (2015) Patel, J., Shah, S., Thakkar, P., & Kotecha, K. (2015). Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques. Expert Systems with Applications, 42, 259–268.
  • Qiu & Song (2016) Qiu, M., & Song, Y. (2016). Predicting the direction of stock market index movement using an optimized artificial neural network model. PloS one, 11, e0155133.
  • Qiu et al. (2016) Qiu, M., Song, Y., & Akagi, F. (2016). Application of artificial neural network for the prediction of stock market returns: The case of the japanese stock market. Chaos, Solitons & Fractals, 85, 1–7.
  • Shah & Zhang (2014) Shah, D., & Zhang, K. (2014). Bayesian regression and bitcoin. In Communication, Control, and Computing (Allerton), 2014 52nd Annual Allerton Conference on (pp. 409–414). IEEE.
  • Srivastava et al. (2014) Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15, 1929–1958.
  • Wang & Wang (2015) Wang, J., & Wang, J. (2015). Forecasting stock market indexes using principle component analysis and stochastic time effective neural networks. Neurocomputing, 156, 68–78.
  • Yong et al. (2017) Yong, B. X., Rahim, M. R. A., & Abdullah, A. S. (2017). A stock market trading system using deep neural network. In Asian Simulation Conference (pp. 356–364). Springer.
  • Zhang & Wu (2009) Zhang, Y., & Wu, L. (2009). Stock market prediction of s&p 500 via combination of improved bco approach and bp neural network. Expert systems with applications, 36, 8849–8854.
  • Zhao et al. (2017) Zhao, Y., Li, J., & Yu, L. (2017). A deep learning ensemble approach for crude oil price forecasting. Energy Economics, 66, 9–16.
  • Zhong & Enke (2017) Zhong, X., & Enke, D. (2017). Forecasting daily stock market return using dimensionality reduction. Expert Systems with Applications, 67, 126–139.
  • Zhu et al. (2014) Zhu, C., Yin, J., & Li, Q. (2014). A stock decision support system based on dbns. Journal of Computational Information Systems, 10, 883–893.