Modeling the multivariate volatility of many asset returns is crucial for asset pricing, portfolio selection, and risk management. After the seminal work of Barndorff-Nielsen and Shephard (2002, 2004) and Andersen et al. (2003), the realized covariance (RCOV) matrix, estimated from the intra-day high frequency return data, has been recognized as a better estimator than the daily squared returns for daily volatility. Consequently, increasing attention has been focused on the modeling and forecasting of these RCOVs; see, e.g., McAleer and Medeiros (2008), Hansen et al. (2012), Noureldin et al. (2012), Bollerslev et al. (2016), and many others.
Existing models for the RCOV matrices can be roughly categorized into two types: transformation-based models and likelihood-based models. Models in the first category capture the dynamics of the RCOV matrices in an indirect way via transformation. Bauer and Vorkink (2011) used a factor model for the vectorization of the log transformation of RCOV matrix; Chiriac and Voev (2011) applied a vector autoregressive fractionally integrated moving average process to model the Cholesky decomposition of RCOV matrix; Callot et al. (2017) transformed the RCOV matrix into a large vector by the
operator, and then fitted this transformed vector by a vector autoregressive model. In the first two models, the dimension of RCOV matrix has to be moderate (e.g., less than 6) for a feasible manipulation. In the third model, the dimension of RCOV matrix is allowed to be 30 in applications with the help of the LASSO method.
Models in the second category deal with RCOV matrices directly by assuming that the innovation, which drives the RCOV time series, has a specific matrix distribution that generates random positive definite matrices automatically without imposing additional constraints. This important feature results in positive-definite estimated RCOV matrices. Unlike scalar or vector distributions, so far only a few matrix distributions have been found to have explicit forms. The primary choice for the innovation distribution is Wishart, leading to the Wishart autoregressive (WAR) model in Gouriéroux et al. (2009), the conditional autoregressive Wishart (CAW) model in Golosnoy et al. (2012), and the generalized CAW model in Yu et al. (2017) to name a few. The other choice for the innovation distribution is matrix-F, which was recently adopted by Opschoor et al. (2018). Generally speaking, matrix-F distribution is the generalization of the usual F distribution, while Wishart distribution is the generalization of the distribution (see, e.g., Konno (1991) and Opschoor et al. (2018) for more discussions). Therefore, matrix-F distribution could be more appropriate than Wishart distribution in capturing the heavy-tailed innovation, which is an important stylized fact in many applications (see, e.g., Bollerslev (1987), Fan et al. (2014), Zhu and Li (2015), and Oh and Patton (2017)). These likelihood models have at least three edges over the transformation-based models. First, the likelihood-based models preserve the useful and important matrix structural information, which makes them more interpretable compared with transformation-based models. Second, the number of estimated parameters in the transformation-based models has order , while the one in the likelihood-based models has order , where is the dimension of the RCOV matrix. When is large, the likelihood-based models can bring more convenience and a less daunting task in computation. Third, the likelihood-based models make use of the likelihood function of the RCOV matrices, and hence their statistical inference methods could be easily provided.
This paper contributes to the literature from three aspects. First, we propose a new Conditional BEKK matrix-F
(CBF) model to study the time-varying RCOV matrices. Our CBF model has matrix-F distributed innovations with two degrees of freedom parametersand . When , our CBF model reduces to the CAW model (Golosnoy et al. (2012)), which has Wishart distributed innovations. Hence, the degrees of freedom is designed to capture the heavy-tailedness of the RCOV. Since the RCOV is also well documented to have long memory phenomenon, we further introduce a special CBF model which has a similar conditional heterogeneous autoregressive (HAR) structure as in Corsi (2009). This special model is coined the CBF-HAR model. Although the CBF-HAR model is not formally a long memory model, it gives rise to persistence in the RCOV time series. Two real examples demonstrate that our CBF model (especially the CBF-HAR model) can have a significantly better forecasting performance than the corresponding CAW model, and hence a simple incorporation of to capture the heavy-tailed RCOV is necessary from a practical viewpoint.
Second, we provide a systematically statistical inference procedure for the CBF model. Specifically, we explore its stationarity conditions, establish the strong consistency and asymptotic normality of its maximum likelihood estimator (MLE), and investigate some new inner-product-based tests for model diagnostic checking. Moreover, the performance of our entire methodology is assessed by simulation studies. Compared to the existing BEKK-type multivariate time series models, our proofs of the entire inference procedure are much involved, since the CBF model is tailored for matrix time series. Particularly, our inner-product-based tests seem to be the first diagnostic checking tool for matrix time series models, and the related idea can be easily extended to other models.
Third, we construct two reduced CBF models — the variance targeted (VT) CBF (VT-CBF) model and the factor CBF (F-CBF) model, to handle moderately large and high dimensional RCOV matrix respectively. For both reduced models, the asymptotic theory of the estimated parameters is derived. The dimension of the RCOV matrix is allowed to be a moderate but fixed number in the VT-CBF model, while it is allowed to grow with the sample size and the intra-day sample size in the F-CBF model. Therefore, this makes the prediction of large dimensional RCOV matrices feasible in many cases. The importance of both reduced models is illustrated by two real applications.
The remainder of the paper is organized as follows. Section 2 introduces the CBF model and studies its probabilistic properties. Section 3 investigates the asymptotics of the MLE. Section 4 presents inner-product-based tests to check the model adequacy. Two reduced CBF models and their related asymptotic theories are provided in Section 5. Some simulation studies are carried out in Section 6. Applications are given in Section 7. Section 8 concludes this paper. Proofs of all theorems are relegated to the Appendices. The remaining proofs are provided in the supplementary material.
Some notations are used throughout the paper.
is the identity matrix of order, and represents the Kronecker product. For an matrix , is its trace, is its transpose, is its determinant,
is its biggest eigenvalue,is its Euclidean (or Frobenius) norm, is its spectral norm, is a vector obtained by stacking all the columns of , is a vector obtained by stacking all columns of the lower triagular part of , and .
2 Model and Properties
2.1 Model Specification
Let be the integrated volatility matrix of asset returns at time . After the seminal work of Barndorff-Nielsen and Shephard (2002, 2004) and Andersen et al. (2003), the positive definite realized covariance (RCOV) matrix calculated from the high-frequency return data of has been widely applied to estimate in the literature; see, e.g., Barndorff-Nielsen et al. (2011), Lunde et al. (2016), Aït-Sahalia and Xiu (2017), Kim et al. (2018) and references therein. Moreover, is often viewed as a precise estimate for the conditional variances and covariances of these low-frequency asset returns , and hence how to predict by some dynamic models is important in practice. Motivated by this, a new dynamic model for is proposed in the current paper.
Let be a filtration up to time . We assume that
where is a sequence of independent and identically distributed (i.i.d.) positive definite random innovation matrices with , each follows the matrix-F distribution , and the density of is
where with degrees of freedom and , is an positive definite matrix, and
moreover, is the square root of the positive definite matrix , which has a BEKK-type dynamic structure (see Engle and Kroner, 1995):
where , , are all real matrices, the integers are known as the orders of the model, and as well as the initial states are all positive definite. Under model (2.1),
with , that is, the conditional distribution of is matrix-F with a BEKK-type mean structure. In this sense, we call model (2.1) the Conditional BEKK matrix-F (CBF) model.
The CBF model is related to the CAW model in Golosnoy et al. (2012), in which follows the Wishart distribution. To see it clearly, we follow Konno (1991) and Leung and Lo (1996) to re-write in model (2.1) as
where and are independent. As
in probability, the identity (2.5) implies that when , , which is exactly the CAW model. Therefore, compared to the CAW model, the degrees of freedom in the CBF model accommodates the heavy-tailed RCOV (see, e.g., Opschoor et al. (2018) for more discussions and examples). Clearly, the identity (2.5) also guarantees to be symmetric and positive definite, and it can be used to generate
by using Wishart random variables.
Besides the heavy-tailedness, long memory is another well documented feature for the RCOV, and it has been taken into account by many RCOV models, including the heterogeneous autoregressive (HAR) model in Corsi (2009) as a benchmark. Although the HAR model does not formally belong to the class of long memory models, it is able to reproduce the persistence of RCOV observed in many empirical data. Inspired by the HAR model, we consider a special CBF model, which has the following specification for :
where , , and are the daily, weekly, and monthly averages of RCOV matrices, respectively. In this case, we label model (2.1) as the CBF-HAR model, since we put “HAR dynamics” on . Clearly, the CBF-HAR model is simply a constrained CBF model with , and . Figure 1 plots the sample autocorrelation functions (ACFs) up to lag 200 of one simulated data from the CBF-HAR model with and
From this figure, we can find that all entries of exhibit long memory phenomenon as expected.
Note that when , sufficient identifiability conditions of model (2.3) are that the main diagonal elements of and the first diagonal element of each , are positive; when , some sufficient identifiability conditions of model (2.3) can be found in Engle and Kroner (1995). For simplicity, we assume subsequently that model (2.3) is identifiable.
Of course, the BEKK specification in model (2.3) is not the only way to describe the dynamics of . The multivariate ARCH-type models such as the VEC model in Bollerslev et al. (1988), the component model in Engle and Lee (1999), the dynamic conditional correlation model in Engle (2002) and many others can also be adopted to model . Using these models together with the matrix-F distribution to fit and predict the RCOV matrices could be a promising direction for future study.
Stationarity is an important issue for most RCOV models, but so far it has been rarely studied. Denote . For , let
where for and for . A sufficient condition for the stationarity of the CBF model is given below, and it works for other general distributions of .
Suppose that in model (2.1) is a sequence of i.i.d. positive definite random matrices with , and
(H1) the distribution of , denoted by , is absolute continuous with respect to the Lebesgue measure;
(H2) the point is in the interior of the support of ;
Then, in model (2.1) is strict stationary with . Moreover, is positive Harris recurrent and geometrically ergodic.
The results of Theorem 2.1 are similar to those in Boussama
et al. (2011), where the stationarity of the BEKK model is studied.
et al. (2011), the proof of Theorem 2.1 is based on the semi-polynomial Markov chains technique, however, it
is much involved due to the matrix nature of model (
is based on the semi-polynomial Markov chains technique, however, it is much involved due to the matrix nature of model (2.1).
3 Maximum Likelihood Estimation
Let be the unknown parameter of model (2.1) with the true value , where is the parametric space with and , , , , and . Below, we assume that and are compact and is an interior point of .
Given the observations and the initial values , the negative log-likelihood function based on (2.4) is
with and calculated recursively by
As the initial values are not observable, we shall modify as
where is defined in the same way as with being replaced by , and is calculated in the same way as based on a sequence of given constant matrices . The minimizer, , of on is called the maximum likelihood estimator (MLE) of . That is,
To study the asymptotic properties of , we need two assumptions below.
is strictly stationary and ergodic.
For , if , almost surely (a.s.) for all .
Assumption 3.1 is standard, and Assumption 3.2 which is in line with Comte and Lieberman (2003) and Hafner and Preminger (2009) is the identification condition. The following two theorems give the consistency and asymptotic normality of , respectively.
Based on the observations and a sequence of given constant matrices , we can use the analytic expression of (see Appendix D in the supplementary material) to estimate by its sample counterpart. As for the univariate ARCH-type models, the coefficients on the main diagonal of are positive to ensure the positive definiteness of . Hence, the classical or Wald test, which is constructed by the estimate of , can not be used to detect whether their values are zeros or not; see Li et al. (2018) for more discussions in this context.
4 Model Diagnostic Checking
Diagnostic tests are crucial for model checking in multivariate time series analysis; see, e.g., Li and McLeod (1981), Ling and Li (1997), Tse (2002) and many others. However, no attempt has been made for the stationary matrix time series. In this section, we propose some new inner-product-based tests to check the adequacy of model (2.1).
Let be the vectorized residual for a given , and be the inner product of two vectorized residuals at lag . Then, we stack up to lag to construct , where
and is a given integer. Our testing idea is motivated by the fact that if model (2.1) is adequate, is a sequence of i.i.d. random vectors with mean zero, and hence the value of is expected to be close to zero. To implement our test, we need study the asymptotic property of in the following theorem.
Based on Theorem 4.1
, we construct the inner-product-based test statistic
to detect the adequacy of model (2.1), where is the sample counterpart of . If is larger than the upper-tailed critical value of , the fitted model (2.1) is not adequate at a given significance level. Otherwise, it could be deemed as adequate.
Note that if we consider a test based on directly, the resulting limiting distribution shall still be chi-squared, but its degrees of freedom increases fast with the dimension . To avoid this dilemma, we use the inner product of the residuals to propose our test . This new idea is different from the portmanteau test in Ling and Li (1997) in which the test statistic is constructed based on the auto-correlations of the transformed scale residuals, while our test is based on the auto-covariances of the original vectorized residuals. Clearly, our idea can be easily extended to the framework in Ling and Li (1997). Meanwhile, our inner-product-based test takes the auto-covariances of all entries of into account, while the idea of regression-based test in Tse (2002) only considers one entry of at a time. In view of this, we prefer to use the proposed inner-product idea for testing purpose.
5 The Reduced CBF Models
As the number of parameters in the CBF model is , the estimation of the CBF model could be very computationally demanding when is large. This section introduces two reduced CBF models, which are feasible in fitting RCOV matrices with a large .
5.1 The VT-CBF model
This subsection proposes a reduced CBF model by using the variance target (VT) technique in Engle and Mezrich (1996). The idea of VT is to re-parameterize the drift matrix by using the theoretical mean of , so that the estimation of is excluded in the implementation of the maximum likelihood estimation. Other related studies on the VT time series models can be found in Francq et al. (2011) and Pedersen and Rahbek (2014).
To define our reduced model, we assume that is strictly stationary with a finite mean . By taking expectation on both sides of (2.3), we have
where all notations are inherited from model (2.1), except that
We call model (5.2) the VT-CBF model. Clearly, this reduced model shares the same probabilistic properties as the full CBF model. Although the VT-CBF model has the same amount of parameters as the full CBF model, its two-step estimator given below is computationally easier than the MLE for the full CBF model.
To present this two-step estimator, we let be the unknown parameters of model (5.2) and its true value be , where is the parametric space with , and . Let with , and . As before, we assume that and are compact and is an interior point of .
In the first step, we estimate by , where . In the second step, we estimate the remaining parameters by the constrained MLE based on the following modified log-likelihood function:
and is calculated recursively by
based on a sequence of given constant matrices . Clearly, is analogous to in (3.3), and it is the modification of the following log-likelihood function:
where is defined in the same way as with being replaced by , and is calculated recursively by
based on the observations and the initial values . The minimizer, , of on is the constrained MLE of . That is,
Now, we call the two-step estimator of in model (5.2). Let and . The following two theorems give the consistency and asymptotic normality of , respectively.
As before, we can use the sample counterpart of the analytic expressions of and to estimate . Although the VT-CBF model can be estimated by the aforementioned two-step estimation procedure, it still has to handle a large number of estimated parameters with order caused by the parameter matrices and . To make a more parsimonious VT-CBF model, we can further impose some restrictions on and . McCurdy and Stengos (1992) and Engle and Kroner (1995) have suggested to use diagonal volatility models, which not only avoid over-parameterization, but also reflect the fact that the variances and the covariances rely more on its own past than the history of other variances or covariances. Motivated by this, we can assume that all and have a diagonal structure, leading to a diagonal VT-CBF model. Clearly, the number of estimated parameters in the diagonal VT-CBF model has order , which is feasible to be handled for a moderate large but fixed .
Next, similar to in (4.1), we can construct the inner-product-based test statistics to check the adequacy of model (2.1) based on the two-step estimator . Let , , be the residual vector for a given , be the inner product of the residuals at lag , and
The asymptotic property of is given in the following theorem.
5.2 The Factor CBF Model
In modern data analysis, the dimension could be growing with the sample size in many cases, and this makes the CBF (or VT-CBF) models computationally infeasible. Also, the dimension may be proportional to (the average intra-day sample size across all assets and all days), and then the methods to calculate used for the fixed deliver an inconsistent estimator of ; see, e.g., Wang and Zou (2010) and Tao et al. (2011) for surveys. To overcome this difficulty, we use the thresholding average realized volatility matrix estimator (TARVM) in Tao et al. (2011) to calculate , and this estimator is consistent for very large , which is allowed to grow with and . For more recent works in this direction, we refer to Aït-Sahalia and Xiu (2017), Kim et al. (2018), and the references therein.
Since the dimension of could be very large, it seems hard to study the dynamics of without imposing some specific structure. Here, we adopt the factor model proposed by Tao et al. (2011) by assuming that
where is an positive definite factor covariance matrix with being a fixed integer (much smaller than ), is an positive definite constant matrix, and is an factor loading matrix normalized by the constraint . In model (5.11), the dynamic structure of is driven by that of a lower-dimensional latent process , while represents the static part of .
Then, we estimate , and by
are the eigenvectors ofcorresponding to its largest eigenvalues. As suggested by Lam and Yao (2012) and Ahn and Horenstein (2013), we may select such that the largest ratios of adjacent eigenvalues are significantly larger.
In order to study the asymptotics of the proposed estimators, we introduce the following technical assumptions.
All row vectors of and satisfy the sparsity condition below. For an -dimensional vector , we say it is sparse if it satisfies
where , is a positive constant, and is a deterministic function of that grows slowly in with typical examples or .
The factor model (5.11) has fixed factors, and matrices and