1 Introduction
A large number of people buy and sell stocks everyday in an aim to make maximum profit. Many mathematical methods and models have been developed which analyses the movement of the stock price. But its not sure if the future stock prices can actually be predicted due to its dependency on various factors and its dynamic nature. In recent years, machine learning and deep learning are being used in almost all the industries including finance. Machine learning in one way can be viewed as a function approximation(or a complex multiple dimensional curve fitting) for a given data. Machine learning can analyse and learn the complex multiple dimensional features of the data which humans cannot visualize or learn. Although there are several mathematical models and techniques for stock prediction, this paper focuses on data driven machine learning approach with least knowledge in finance.The future stock price is to be predicted given the past prices. This paper tries to use and analyse the complex feature extraction ability of deep learning to learn the pattern of the stock price movement and predict the future price.
2 Machine Learning
In recent times machine learning research in finance has been steadily increasing. There are generally 2 types of tasks in machine learning, classification and regression. Supervised machine learning regression model will be used for this stock prediction task.
2.1 Classical Machine Learning Algorithms
Classical machine learning algorithms are much more easier to interpret and understand than deep learning as we have a thorough understanding of underlying algorithms. These algorithms works better even on smaller data set and are computationally cheaper than deep learning techniques. Many researches have been done in predicting the stock price using classical machine learning algorithms. The author of [1]
has used Support Vector Machine (SVM) for financial forecasting and also did experimental analysis of parameters for SVM. Random forest techniques are also used in financial data, in
[2]. Random forest, Naive bayes and support vector machine are used for classification the direction of movement of financial data.
2.2 Deep Learning
Although many machine learning algorithms exists and are successful, the evolution of deep learning marked a great milestone in the field of Artificial intelligence. The base work for deep learning started in 1940s, but it became more popular recently due to availability of more data and cheap computation devices. The performance of deep learning models increased exponentially every year and is projected to increase more. Image classification task is performed in
[3] using a Artificial Neural networks. After Neural Networks, many new models were invented to increase the performance of deep learning in images, videos and time series data such as text, voice, etc. Convolutional Neural Network [4]won the imagenet competition as it was good in extracting features of images/frames. Then Recurrent Neural networks
[5] were used for series data such as text and voice which needed a memory to remember the previous data features. Deep learning also performs very good in unsupervised models such as Auto Encoder [6] , General Adverserial Networks(GAN) [7]and in Reinforcement Learning.
3 Deep Learning in Finance
3.1 Artificial Neural Networks(ANN)
ANNs are models comprised of densely connected computation nodes(neurons). These neural networks have the ability to learn complex features of the input data and perform the task. ANNs are series of matrix multiplication with nonlinear function to make the whole network non linear to learn more complex features.
(1) 
(2) 
(3) 
where n is the number of layers in the network, h is the hidden unit , is the prediction in forward pass through the model abd
is the activation function.
[8] and [9] uses Artificial Neural Networks to predict the stock price and direction of movement of the price. Dimensionality reduction techniques such as Principle Component Analysis(PCA) are used in [10]for stock prediction. Artificial neural networks are also experimented for the task of predicting close price after 5 time interval(days/hour/minute). Data got from data processing steps explained in PROPOSED APPROACH was used and Tensors of shape (n, 20) was used as input data , where n is the number of data. And tensor of shape (n,1) was the label. The model consists of 4 layers of Fully Connected Dense Layer with dropouts and ReLU Non Linearity.
3.2 Convolutional Neural Network
Convolutional Neural Network(CNN)s are stacks of convolution operations between input which is passed through the network and filters(kernels) which extract the features of the input. The network is also activated with some activation function like ReLU for non linearity . The dimension of the layers are reduced with Pooling layers to reduce computation and it can also be viewed as increasing the feature concentration. [11] shows the potential of convolutional neural network for finance stock prediction. 1d convolutional network [12]
is also used to predict the stock movement as a classification model with 1 day close, open, high, low, volume data. For this experiment, since the data is 1 dimensional , Conv1d(1 dimensional convolutional layers) of Pytorch is used with 3 convolutional layers with MaxPooling and ReLU activation. Then the convolutional layers are flattened into tensor of shape (n, 1, 1), where n is the number of data in the batch and 1 represents length of the layer multiplied by number of channels in the last convolutional layer. Followed by 3 layers of Dense or Fully Connected Layers with ReLU activation and Dropouts to avoid over fitting of the data.
3.3 Recurrent Neural Networks
Recurrent Neural network predicts an output given an input but in a sequential manner. The inputs and outputs are in sequence like text or audio.
(4) 
(5) 
3.4 Neural Arithmetic Logic Units
Neural Networks, although can perform several tasks nearly to human level accuracy, but they seem to fail when it encounters quantities outside the range of training data, like extrapolation. This shows that that the models actually try to fit the data rather than to generalize and learn it. [15] proposed a new module Neural Accumulator and Neural Arithmetic Logic Units which can be added to any neural network architecture which helps in generalizing quantities to neural network and helps the model to generalize for tasks like extrapolation.
Stock prediction in one way can also be seen as an extrapolation task , where we are trying to predict the stock price in the future which can be above or below the range of out training data. In this paper we propose to use the ability of the Neural Arithmetic Logic Units to generalize and extrapolate to our task of stock prediction.
4 Proposed Approach
4.1 Data
Historical stock price data of India from Feb, 2015 to Aug, 2018 was used for this research. The data contains columns like Date, Close, High, Low, Open, Volume. This data changes every 1 hour, a total of around 6200 price data. The data set is checked for missing data and removed. Only Close prices are taken. All the other columns such as Date, High, Low, Open, Volume are omitted in the data. The goal is to predict stock closing price after 5 interval, with the closing price of past 20 intervals. This is a regression task to predict the exact closing price. For computational reasons and faster convergence, the data is scaled to a range of 01. The stock values are scaled with
(6) 
Close Price  411.15  414.05  410.20  410.25  410.00 
Scaled Close  0.1840  0.1874  0.1828  0.1829  0.1826 
After scaling, the data is split into input and label. Input contains past 20 scaled close prices and the label contains the scaled stock prices after 5 intervals.
Facebook’s PyTorch framework was used to design the computation graph and for training the model. The arrays of data are converted into tensors and are split into batches for faster computation using the advantage of Matrix operations. So the input X will be a vector of shape (20, 1) and label will be of shape (1, 1).
The data was split into training and testing data in the ration of (8:2). And a batch size of 1232 was used to split the data into 5 equal batches. So 4 batches of 1232 data for training set and 1 batch for test set. Each batch of data will be a tensor of shape (1232, 20) for Artificial Neural network models and tensor of shape (1232, 1, 20) for Convolutional Neural Network models.
4.2 Neural Arithmetic Logic Units(NALU) based model for Stock Prediction
Instead of PyTorch’s nn.Linear layers, a self defined NALU module which is defined by
Neural Accumulator(NAC):
(7) 
(8) 
Neural Arithmetic Logic Unit(NALU):
(9) 
(10) 
(11) 
Sigmoid function was used in the calculation of m instead of exponential function which was used originally in the Neural Arithmetic Logical units paper. Four layers of Neural Arithmetic Logic Units are stacked like fully connected layers using defined pytorch NALU module. Dropouts are added in between each layer as a regularization technique to avoid over fitting the data. Relu activation function is added in between the NALU layers.
(12) 
Finally sigmoid activation is used to make the prediction in the desired range of 01 (as the data is scaled to 01 range).
(13) 
The output of the network is compared with the true value using Squared L2 Norm(Mean Squared Error) loss function.
(14) 
where is the true label value and is the model prediction for training data.To minimize the loss, back propagation algorithm is used with Adam optimizer. A cyclic learning rate [16] scheduler has been used with the optimizer as an attempt to escape the problem of local minimum of loss. When the algorithm is stuck in a local minimum or narrow minimum , increasing the learning rate help it escape the local space and reach a better or wider minimum space. Each data batch is has been used 500 times to learn and update the weight parameters of the model so as to reduce the total loss. As we use cyclic learning rate, the loss tends to go high when the learning rate increases, so we save the model state with lowest loss.
4.3 Convolutional feature extraction and NALU based model for Stock Prediction
Convolutional Neural Network has been used to predict the stock in the past. This paper proposes a new model using the feature extraction ability of convolutional neural network with the Neural Arithmetic Logic Units. As the stock data is 1 dimensional series data, 1 dimensional convolutional layers using nn.Conv1d in Pytorch are used and stacked 3 layers of 1d convolutional layers to extract the features of stock price movements. Kernel size of 4 has been used in the network for all the convolutional layers. The number of kernels/filters in each layers are 1, 16, 32 and 64. Max pooling layers are added in between every convolutional layer to reduce the dimension , kernel size of 1 or 2 is used and stride is also 2 , which will reduce the layer length to half. ReLU activation function is used to make the network non linear.
Convolutional layers are followed by 2 layers of Neural Arithmetic Logic Units and 2 layers of Fully connected layers as the regressor. ReLU activation function is used in between the linear and NALU layers with dropouts to avoid overfitting of the data. we use sigmoid activation function in the last layer of the network to make the prediction in the range of 01 . Squared L2 Norm loss function was used to get the loss after the forward pass, Adam optimizer was used for optimization and Cyclic learning rate scheduler was used to change the learning rate in cycle from to .
5 Results
Different models were used in this research to find which model is able to learn the trend of the stock price and predict the future price given the last 20 prices better. In each iteration after training the models using training set, the testing set is used to check how good the model has learned and how good it can predict unseen data. After training the model, the whole stock close data is predicted using the trained model and plotted to visualize how good the model performs on the data as a whole.
TABLE 2 gives the training loss of each of the model. It can be observed that Models with Neural Arithmetic Logic Units learned better ANN and CNN models.TABLE 3 gives the loss of the models on testing set. Models with Neural Arithmetic Logic Units was able to predict the stock price better than ANNs and CNNs on unseen data.After the training and validating the testing set, the model was used to test the complete data . Previous 20 data points were given and the model predicted the close price after 5 intervals. The loss of the model with the whole data set is given by TABLE 4. This value has to be re scaled back to the original interval to compare with the actual price.
Model  Training Loss 

Artificial Neural Network(ANN)  8.04649e06 
Convolutional Neural Network(CNN)  5.58822e06 
Neural Arithmetic Logic Units Network(NALU)  1.91356e06 
NALU CNN Network(NALUCNN)  5.58499e07 
Model  Testing Loss 

Artificial Neural Network(ANN)  1.30709e06 
Convolutional Neural Network(CNN)  5.99638e07 
Neural Arithmetic Logic Units Network(NALU)  4.31875e07 
NALU CNN Network(NALUCNN)  3.05196e07 
Model  Total Loss 

Artificial Neural Network(ANN)  1.29998e06 
Convolutional Neural Network(CNN)  1.07971e06 
Neural Arithmetic Logic Units Network(NALU)  3.97540e07 
NALU CNN Network(NALUCNN)  3.30627e07 
6 Conclusion
In this paper we proposed to use the feature extraction property of convolutional neural networks and the extrapolation and arithmetic ability of Neural Arithmetic Logic Units to predict the stock price 5 days later.
During the course of this experiment it was observed that the models with Neural Arithmetic Logic Units(NALU) converged faster than the other model not only in the task of Stock prediction but also on many other tasks. NALU models were able to learn the pattern and other features of the stock values and was able to predict the closing price better than ANNs and CNNs.
References
 [1] L. J. Cao and F. E. H. Tay. Support vector machine with adaptive parameters in financial time series forecasting. IEEE Transactions on Neural Networks, 14(6):1506–1518, Nov 2003.
 [2] Jigar Patel, Sahil Shah, Priyank Thakkar, and K Kotecha. Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques. Expert Systems with Applications, 42(1):259 – 268, 2015.
 [3] Prof S K Shah. Image classification based on textural features using artificial neural network (ann).
 [4] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012.
 [5] J. T. Connor, R. D. Martin, and L. E. Atlas. Recurrent neural networks and robust time series prediction. IEEE Transactions on Neural Networks, 5(2):240–254, March 1994.

[6]
Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and PierreAntoine Manzagol.
Extracting and composing robust features with denoising autoencoders.
In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, pages 1096–1103, New York, NY, USA, 2008. ACM.  [7] Ian Goodfellow, Jean PougetAbadie, Mehdi Mirza, Bing Xu, David WardeFarley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2672–2680. Curran Associates, Inc., 2014.
 [8] Yakup Kara, Melek Acar Boyacioglu, and Ömer Kaan Baykan. Predicting direction of stock price index movement using artificial neural networks and support vector machines: The sample of the istanbul stock exchange. Expert Systems with Applications, 38(5):5311 – 5319, 2011.
 [9] K. Abhishek, A. Khairwa, T. Pratap, and S. Prakash. A stock market prediction model using artificial neural network. In 2012 Third International Conference on Computing, Communication and Networking Technologies (ICCCNT’12), pages 1–5, July 2012.

[10]
ChihFong Tsai and YuChieh Hsiao.
Combining multiple feature selection methods for stock prediction: Union, intersection, and multiintersection approaches.
Decision Support Systems, 50(1):258 – 269, 2010.  [11] J. Chen, W. Chen, C. Huang, S. Huang, and A. Chen. Financial timeseries data analysis using deep convolutional neural networks. In 2016 7th International Conference on Cloud Computing and Big Data (CCBD), pages 87–92, Nov 2016.
 [12] Sheng Chen and Hongxiang He. Stock prediction using convolutional neural network. IOP Conference Series: Materials Science and Engineering, 435(1):012026, 2018.
 [13] Kai Chen, Yi Zhou, and Fangyan Dai. A lstmbased method for stock returns prediction: A case study of china stock market. In Proceedings of the 2015 IEEE International Conference on Big Data (Big Data), BIG DATA ’15, pages 2823–2824, Washington, DC, USA, 2015. IEEE Computer Society.
 [14] Akhter Mohiuddin Rather, Arun Agarwal, and V.N. Sastry. Recurrent neural network and a hybrid model for prediction of stock returns. Expert Systems with Applications, 42(6):3234 – 3241, 2015.
 [15] Andrew Trask, Felix Hill, Scott E Reed, Jack Rae, Chris Dyer, and Phil Blunsom. Neural arithmetic logic units. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. CesaBianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 8046–8055. Curran Associates, Inc., 2018.
 [16] Leslie N. Smith. No more pesky learning rate guessing games. CoRR, abs/1506.01186, 2015.