Trading volume is the total quantity of shares or contracts traded for specified securities such as stocks, bonds, options contracts, futures contracts and all types of commodities. It can be measured on any type of security traded during a trading day or a specified time period. In our case, daily volume of trade is measured on stocks. The volume of trading is an essential component in trading alpha research since it tells investors about the market’s activity and liquidity. Over the past decade, along with the improved accessibility of ultra-high-frequency financial data, evolving data-based computational technologies has attracted many attentions on the financial industry. Meanwhile, the development of algorithmic and electronic trading has shows great potential of trading volume since many trading models require intraday volume forecasts as an key input. As a result, there is growing interest in developing models for precisely predicting intraday trading volume.
Researchers aims to propose various strategies to accomplish trading efficiently in the electronic financial markets, meanwhile they wish to minimize transaction costs and market impact. The study of trading volume generally falls into two lines to achieve the goals. One line of work is focused on giving optimal trading sequence and amount, while another line is investigating the relationships among trading volume and other financial variables or market activities such as bid-ask spread, return volatility and liquidity, etc. Thus a precise model that provides insights of trading volume can be regarded as a basis for two lines of work.
There are several existing methods to estimate future trading volume. As a fundamental approach, rolling means(RM) predict intraday volume during a time interval by averaging volume traded within the same interval over the past days. The concept of RM model is straightforward, but it fails to adequately capture the intraday regularities. One classical publicly available intraday volume prediction model decomposes trading volume into three components, namely, a daily average component, an intraday periodic component, and an intraday dynamic component, then adopts the Component Multiplicative Error Model (CMEM) to estimate the three terms(cmem). Though this model outperforms RM, the limitations such as high sensitivity to noise and initial parameters complicate its practical implementation. kffortrading propose a new model to deal with the logarithm of intraday volume to simplify the multiplicative model into an additive one. The model is constructed within the scope of a two-state (intraday and overday features) Kalman Filter (kforiginal)
framework, the authors adopt the expectation-maximization (EM) algorithm for the parameter estimation. Though the model provides a novel view to study intraday and overday factors, the flexibility is not satisfied since the model treat the number of hidden states of all stocks as two, thus there may be information loss. Moreover, from experiment we see that the dominant term in the model is actually daily seasonality, the learning process of parameters is not robust.
As an extension of two-state Kalman Fiter, our new model has advantages such as higher prediction precision, stability and simple structure. In general our contributions are:
Firstly, we develop a new way that combines cubic spline and statistical process to determine the best degrees of freedom (DOFs) for different stocks.
Secondly, by choosing suitable DOFs, we provide a smoothing prediction of traded volume.
Finally, we demonstrate that our model outperforms RM and two-state Kalman Fiter through experiments on 978 stocks.
We denote the -th observation on day as , the local indices and , we set global index = thus = .
2.1 two-state Kalman Filter Model
Before introducing our method, we would like to review two-state Kalman Fiter model. Within the model, the volume is defined as the number of shares traded normalized by daily outstanding shares:
This ratio is one way of normalization (kffortrading) since the normalization helps to correct low-frequency variation caused by change of traded volume. Log-volume refers to the natural logarithm of traded volume. The researchers train their model with log-volume and evaluate the predictive performance based on volume. The reason for using log-volume is that it converts the multiplicative terms (cmem)
to additive relationships and makes it naturally fit Kalman Fiter framework, moreover, the logarithmic transformation facilitate to reduce the skewness property of volume data(behaviour). kffortrading’s model is built within Kalman Fiter framework as shown in Figure 1. represents hidden state that is not observable, represents logarithm of observed traded volume. The mathematical updates are:
for , where
is the hidden state vector containing two parameters, namely, the daily average part and the intraday dynamic part;is the state transition matrix; observation matrix is fixed as ; where ; , and is treated the seasonality; initial state . The unknown system parameters are estimated by closed form equations, which are derived from expectation-maximization(EM) algorithm. For more details of two-state model, we suggest readers review the original paper.
2.2 Our Model: Various-states Kalman Filter
In two-state model mentioned above, the DOF of hidden state variable is two since it has intra-day and over-day two factors. Since there is no systematic way to determine a correct DOF of hidden state variable, especially for various stocks. Our concern is that how to find a better DOF for each stock and predict more precisely. Thus our new method still falls into the Kalman Fiter framework shown in Figure 1, however, we change equation 2 to the most common Kalman Fiter update equation:
The differences among Equation 2 and Equation 3 are as follows: represents hidden state whose dimension is that , as the DOF of hidden state variable, will be determined in Section 2.2.1; state transition matrix is a matrix while observation matrix is a matrix; where transition covariance matrix is a matrix; where observation covariance matrix is a matrix; initial state and observation is a vector. Notice that , , and are uniquely determined by training data. The reason that we use as subscript is that every time we predict one day’s traded volume.
Within the framework of our model, the data we use is historical daily trading volume. We define observation as multiplication of traded volume and olume Weighted Average Price.
Furthermore we model and evaluate performance with the percentage ratio:
2.2.1 DOF of State Space
In our assumption, different stocks will have distinct number of parameters in hidden state . For a specific stock, we call the number of elements in as degrees of freedom(DOFs), thus for stock with index , . The key concern is how to determine DOFs for each stock. By experiment, we find that seasonality dominates the prediction of traded volume in two-state model, and in both of two-state model and our model, =77, which means each day for each stock we have 78 observations and has 78 parameters. We look for including seasonality in the hidden states and drop to avoid dominant term. In terms of avoiding overfitting and reduce computations, we use cubic spline to fit observations smoothly in each day. Given a series of observations , cubic spline is to find the function that minimizes:
where in our case, is a nonnegative tuning parameter. The function that minimizes 6 is known as a smoothing spline. The term encourages to fit the data well, and the term is regarded as a penalty that controls smoothness of the spline. By solving 6 we have:
where , as the solution to 6 for a particular choice of , is a -vector containing the fitted values of the smoothing spline at the training points . Equation 7 indicates that the vector of fitted values can be written as a matrix times the response vector y. Then the DOF is defined to be the trace of the matrix . Our first purpose is to give a specific DOF and get a corresponding spline. Thanks to the work of B. D. Ripley and Martin Maechler (https://rdrr.io/r/stats/smooth.spline.html), we are able to get fitting splines when given reasonable DOFs. After fitting process then we use cross validation to find DOF that achieves lowest mean squared error(MSE). Algorithm 1 outlines the mechanism of finding DOF of each stock. We analyze the DOFs of 978 stocks, examples of the distribution of DOFs and best DOFs of some stocks are shown in Figure 2.
2.2.2 Kalman Filter
Given the best DOF from our method, then we use Kalman Filter to do predictions. Kalman Filter is an online algorithm to precisely estimate the mean and covariance matrix of hidden states. Suppose parameters in Equation 3 are known, Algorithm 1 outlines the mechanism of the Kalman Filtering. We model the distribution of hidden state conditional on all the percentage observations up to time . Since we suppose and in Equation 3
are Gaussian noise, thus all hidden states will follow a Gaussian distribution and it is only necessary to characterize the conditional mean and the conditional covariance as shown in line 3 and line 4. Then given new observation we correct the mean and covariance in line 7 and line 8.
Our ultimate goal is to make predictions of and by Algorithm 1 and , respectively. Thus we need to estimate parameters precisely. The method to calibrate parameters is expectation-maximization(EM) algorithm. Smoothing process infer past states before conditional on all the observations in the training set, which is a necessary step in model calibration because it provides more accurate information of the unobservable states. We outlines Kalman smoothing process as Algorithm 3.
After performing Algorithm 1, 2 and 3, we need to estimate parameters by EM method, as shown in algorithm 4. EM algorithm is one common way to estimate parameters of Kalman Filter problem. It extends the maximum likelihood estimation to cases where hidden states are involved (emalg). The EM iteration alternates between performing an E-step (i.e., Expectation step), which constructs a global convex lower bound of the expectation of log-likelihood using the current estimation of parameters, and an M-step (i.e., Maximization step), which computes parameters to maximize the lower bound found in E-step. Two advantages of EM algorithm are fast convergence and existence of closed-form solution. The derivations of Kalman Filter and EM algorithm beyond the scope of this paper, we refer interested readers to kforiginal’s work for more details.
3.1 Data Introduction
Our collect empirically analyze intraday volume of 978 stocks traded on major U.S. markets. For example, a glance of the information of stock "AAPL" is summarized in Table 1. The data covers the period from January 3rd 2017 to May 29th 2020, excluding none trading days. Each trading day consists of 78 5-minute bins. Volume and percentage are computed as Equation 4 and 5 respectively. All historical data used in the experiment are obtained from the Invesco Inc.
3.2 Experiment Set-up
We choose two-state model and RM model mentioned before as baselines. Data from January 3rd 2017 to June 30th 2017 is considered as training set while data from July 5th 2017 to January 2nd 2018 is treated as test set. Both of training set and test set contain =125 trading days(from day to day ). We initialize as identity matrices, and as smooth matrix, then perform Algorithms 1 to 4 on traning set to obtain model parameters, finally make predictions on next =125 days(from day to day ). We evaluate performances of three models by mean absolute percentage error(MAPE):
In this section we compare our model with two state-of-the-art baselines and perform some analysis of our results.
3.3.1 MAPE Distribution
We obtain the distributions of MAPE of 978 stocks from three models and show them in Figure 3. We see that our v-state model outperforms baselines by giving smaller MAPEs.
3.3.2 Predictions on Specific Days
To better visualize comparisons, we pick ten stocks out of the dataset and show their predictions on day 150, 175, 200 and 225. Due to space limitation, we only show one stock "AAPL" here in Figure 4 and show other nine stocks in Appendix. We see that two-state model almost overlaps RM model. Our v-state model provides a smoother prediction.
3.3.3 Analysis of v-state Model
We investigate the relationship between absolute error and true percentage for all stocks to further test the precision of our model. The absolute error of stock on -th bin is defined as:
As illustrated in Figure 5, we plot error versus for all 97812578 samples. We see there is a nearly linear relationship between absolute error and true percentage, when gets larger, the slope gets closer to 1. Moreover, we observe that for 95% samples that fall into the corner around the original point, we have:
And for those samples outside of the corner we plug in the linear equation with slope :
From Equation 10 and Equation 11, our model provides a lower bound as well as a upper bound for the prediction precision. Due to the high-noisy property of original data, it is still not a trivial task to fully capture the movements of trading volume. It could be one potential direction in the future.
Correlation matrix of hidden states, eigenvalues of transition covariance matrix, "AAPL" and "JPM"
We also show examples of correlation matrix of hidden states, eigenvalues of transition covariance matrix in Figure 6
and appendix. It suggests that most of states don’t highly correlate to each other, meanwhile a few states that share linear relationships. We find that half of the eigenvalues are very close to 0. Notice that the eigenvalues from a covariance matrix inform us the direction that the data has the maximum spread, the eigenvectors of the covariance matrix will be the direction with the most information. The information of correlation matrix and eigenvalues of covariance matrix could help us further simplify the number of states since more states in the model brings more computations. We also leave this as one of future directions.
On the basis of Kalman Filter, we develop a new method to determine the dimension of hidden state for each stock. Our methods provides smoothing predictions of intraday traded volume, shows an potential of using gradient based method for further analysis. Through experiments we demonstrate that v-state model gains better prediction precision than two-state model and RM model. Inspired by a series of model analysis, in the future we will conduct further research on reducing and number of DOFs and selecting states.