1 Introduction
Spot covariance has important applications in studying the intraday patterns of the covariance process, cojump tests (Bibinger and Winkelmann (2015)) and estimating parametric multivariate stochastic volatility models (Kanaya and Kristensen (2016)
). Moreover, understanding covariance dynamics is crucial for effective portfolio choice, derivative pricing, and risk management. The availability of highfrequency intraday data of asset returns has given rise to several approaches for estimating integrated (co)variances and spot variances. While the literature proposes few measures of integrated covariance, see e.g. BarndorffNielsen and Shephard
(2004a), Hayashi and Yoshida (2011), there is sparse literature on empirical approaches and statistical theory to estimate spot covariances with highfrequency data.In this paper, we consider the nonparametric filtering of spot covariance with highfrequency financial data. Our study is at the intersection of two fields of literature. The first strand of literature is on estimating integrated covariance matrices over a fixed period. This topic has been studied extensively in highfrequency econometrics. For example, the highly celebrated paper by BarndorffNielsen and Shephard (2004a) makes important contributions to the use of realized covariance to estimate integrated covariance matrix in a setup without market microstructure noise. The quasimaximum likelihood estimator by AïtSahalia et al. (2010), the multivariate preaveraging estimator by Christensen et al. (2013), the twoscale estimator by Zhang (2011) are robust to microstructutre noise. However, all above mentioned realized covariance estimators do not account for jumps in the underlying price process.
The second strand focuses on spot volatility estimation. Several approaches of estimating spot volatility were proposed. Foster et al. (1996) were the first to introduce the spot volatility estimator: rolling and sampling filters. Later, kerneltype estimators were introduced in Fan and Wang (2008) and Kristensen (2010). These estimators of spot variance neglect the microstructure noise and jumps. The examples of spot variance estimators accounting for microstructure noise include Zu and Boswijk (2014), Bos et al. (2012), Mykland and Zhang (2008). Yu et al. (2014) extend kernel spot volatility estimator of Kristensen (2010) to the case when the underlying price process has jumps.
The estimation of spot covariance matrix is, however, an area that has been studied the least. For a multidimensional continuous semimartingale logasset price process Bibinger et al. (2017) propose an estimator for spot covariance which is constructed based on a local average of blockwise parametric spectral covariance estimates. Aiming to fill this gap in the literature we study the spot covariance estimation of both continuous and discontinuous semimartingales.
Our contribution is following. First, for a setup without jumps, we study asymptotic properties of the kernel covariance estimator, which was mentioned in Kristensen (2010) as an extension to the multivariate case and was left for the future research. Second, we propose the threshold kernel covariance estimator when the underlying price process is a discontinuous semimartingale with finite activity jumps. We derive the asymptotic distribution of this estimator for a fixed bandwidth. The estimator is an extension to the multivariate case of the threshold kernel volatility estimator proposed by Yu et al. (2014). Third, we conduct numerical studies to examine finite sample properties of both estimators. Next, we study an application of the kernel estimator in the context of covariance forecasting.
In a setup without jumps the estimator is a kernelweighted version of the standard integrated covariance estimator, which depends on a kernel function and choice of bandwidth. It can be regarded as a kernel regression in the time domain. The bandwidth choice allows us to focus on the covariance behavior at specific points in time, and give different weights to the covariance matrix over the window used. As the bandwidth shrinks to zero, the spot covariance can be extracted. We establish asymptotic normality of the estimator for both fixed and shrinking bandwidth. The proofs are componentwise. We construct our proofs referring to the techniques of BarndorffNielsen and Shephard (2004a) and Kristensen (2010)
. We first derive the mean and covariance of the estimator. We then derive the asymptotic distribution by employing central limit theorem for triangular arrays and CramérWold device. We also prove the asymptotic normality for the threshold kernel estimator with fixed bandwidth. In the proof of this theorem we combine our results from the first theorem, techniques from Yu et al.
(2014) and employ CramérWold device. In simulation study we examine the finite sample properties of both estimators using the integrated mean square error and the integrated bias performance measurements.The rate of convergence of both estimators is
. The local method of moments estimator of the spot covariances of Bibinger et al.
(2017) attains slower optimal rate of convergence (). However, it should be noted this is due to the fact that Bibinger et al. (2017) consider the setting with market microstructure noise, whereas we target for a complementary jump case. The kernel and threshold kernel covariance estimators are fairly easy to implement.In terms of applications of this kernel covariance estimator, considerable efforts has been put into covariance forecasting, see e.g. Alexander (2018), Andersen et al. (2013). Multivariate GARCH models are a standard tool used in modelling and forecasting covariances. However, more recent studies propose models based on highfrequency data and options implied data. In a comprehensive empirical study by Symitsi et al. (2018)
several approaches to the covariance forecasting are compared based on statistical and economic criteria. In this study the authors conclude that models based on highfrequency data offer a clear advantage in terms of statistical accuracy. In particular, a Vector Heterogeneous Autoregressive (VHAR) model achieves the best performance amongst the competing models. The VHAR model is a linear combination of past daily, weekly and monthly realized covariance estimators of BarndorffNielsen and Shephard
(2004a).Motivated by this we use the VHAR model to forecast covariance, however instead of the realized covariance estimator we use newly proposed kernel covariance estimator. We further show that with the VHAR model the kernel covariance estimator outperforms the benchmark realized covariance estimator in all three measures of accuracy: the Euclidean loss function, the Frobenius distance and the multivariate quasilikelihood loss function.
The paper is structured as follows. In Section 2.1 we review theoretical setup of the problem and the kernel covariance estimator which was proposed in Kristensen (2010) and left for the future research. In Section 2.2 we study the asymptotic properties of the estimator for a fixed and small (tending to zero) bandwidth. In Section 3 we introduce the setup with jumps, propose the estimator for jump case and derive its asymptotic distribution. In Section 4 we conduct Monte Carlo simulations and investigate the finite sample properties of both estimators. In Section 5 we present an application of the estimator in the context of covariance forecasting. Finally, in Section 6 we summarise our findings.
2 Kernel Covariance Estimation
2.1 Theoretical Setup and the Kernel Covariance Estimator
In this section we start by considering a multidimensional continuous semimatingale, describe the theoretical setup and review the kernel covariance estimator in Kristensen (2010). Our aim is to accurately estimate the spot covariance matrix of a fixed dimensional logprice process . We assume that follows a continuous semimartingale
(1) 
defined on a filtered probability space
, with an initial condition , the drift vector , the dimensional standard Brownian motion and the instantaneous volatility matrix which has elements that are all càdlàg. The latter yields the dimensional spot covariance matrix , which is our object of interest. We also denote the integrated covariance matrix by . We consider the finite and fixed time horizon with highfrequency discrete observations of the realization of th asset, with . For an arbitrary partition of the interval we require that approaches zero under the asymptotic limit. For simplicity, we consider the case of equally spaced and synchronous observation times. We denote , so that for .A kernel is a nonnegative integrable function satisfying the following condition: . The kernel weighted measure of the integrated covariance, which is an extension of the measure of the integrated variance introduced in Kristensen (2010), is of the following form
(2) 
where the function is given by , satisfies , and is the fixed bandwidth. delivers a kernel weighted average of the quadratic covariation.
An estimator of the integrated covariance in equation (2) is the kernel smoothed sample average of the increments, which was mentioned in Kristensen (2010) as an extension of the univariate case and was left for the future research:
(3) 
where is the dimensional vector ( is fixed) of the increments of the process over time interval . As demonstrated above, for a fixed , gives a weighted measure of the integrated covariance. However, as , the instantaneous covariance can be recovered at any point of continuity of :
(4) 
To emphasize that we are working with an estimator of the instantaneous covariance at time , we shall denote:
(5) 
Note that, can be regarded as the NadaryaWatson estimator. An overview of this types of kernel can be found in Silverman (1986). In the univariate case, i.e. when , we recover the spot variance estimator from Kristensen (2010).
2.2 Asymptotic Properties of the Kernel Covariance Estimator
In this section we state the necessary assumptions and present the two out of the three main results of the paper. Our first theorem derives the asymptotic distribution of the kernel covariance estimator for the fixed bandwidth. Theorem 2 proves asymptotic normality of the kernel covariance estimator for a tending to zero bandwidth. Throughout our work we shall consider the following set of assumptions:
Assumption 1.
The processes and are jointly independent of .
This assumption holds for a widely used stochastic volatility models, such as Heston (1993), Hull and White (1987). Assumption 1 greatly facilitates the proof by allowing us to make all arguments conditional on and . Under Assumption 1, the volatility process being independent of , the model falls into the case without leverage effects. However, this assumption does not appear to be strictly necessary as demonstrated in Kanaya and Kristensen (2016).
Assumption 2.
For any sequences , with and every , as
(6) 
where
Assumptions 2 imposes a restriction on the local behavior of the mean and covariance processes. It allows for the deterministic patterns, jumps, and nonstationarity, and is automatically satisfied when the mean and volatility processes have continuous trajectories. In particular, standard diffusion models such as Heston (1993), Hull and White (1987) satisfy this assumption.
Assumption 3.
For every and the quantities
(7) 
are bounded away from 0 and infinity uniformly in .
Equation (7) in Assumption 3 essentially means that, on any bounded interval, itself is bounded away from infinity. This is the case, for example for CoxIngersollRoss (CIR) and OrnsteinUhlenbeck (OU) processes in Cox et al. (1985), Uhlenbeck and Ornstein (1930) respectively. The above mentioned assumptions are sufficient to derive asymptotic distribution of , however in order to get the asymptotics of , when , the general smoothness condition needs to be imposed on the covariance process.
Assumption 4.
The space for some and consists of functions that are times differentiable with the th derivative , satisfying
(8) 
where is Lipschitz coefficient, a slowly varying function at zero and is continuous. The mapping for lies in for some and .
As stated in Yu et al. (2014) this condition is satisfied by commonly used diffusion processes. When Assumption 5 holds with and the model is driven by a Brownian motion (see e.g. Revuz and Yor (1998, ch.5)).
We also impose requirements on the kernel function:
Assumption 5.
The kernel

[label=()]

satisfies and continuously differentiable, i.e. , such that

satisfies the condition that there exists some constants and such that , and for some , for , .

satisfies , and , for some .
The assumptions above are satisfied by most standard kernels for . When , is called a higherorder kernel. If as well, the higherorder kernels can be used to reduce the bias in the estimation of more than twice differentiable functions. Although, as mentioned in Kristensen (2010), since is a usual case, Cline and Hart (1991) demonstrated that higherorder kernels can potentially reduce bias even when the object of interest is nonsmooth and has jumps.
Now we are ready to derive the asymptotics of the kernel covaraince estimator for a fixed bandwidth.
Theorem 1.
Proof.
We give the proof in several steps. First we derive the means, variances and covariances of the variates
with . Second, the Theorem 1 is proved for the case, where the mean processes are identically , by employing CramerWold device. Finally, the latter restriction is lifted and using lemma 5 in Appendix D the negligibility of nonzero drift term is shown. The proof is componentwise and based on the results and techniques employed by BarndorffNielsen and Shephard (2004a) and Kristensen (2010). See Appendix A for the details of the proof. ∎
This theorem is an intermediate step in the derivation of the asymptotic distribution of the estimator for a shrinking bandwidth. The Theorem 1 is necessary for the proof of the asymptotic normality of the spot kernel covariance estimator in (5).
Theorem 2.
Proof.
See Appendix B. ∎
Bibinger et al. (2017) propose spot covariance estimator which is constructed based on local averages of blockwise parametric spectral covariance estimates. This is an extension of the local method of moments (LMM) in Bibinger and Reiss (2014). Since Bibinger et al. (2017) consider a setting with market microstructure noise, their estimator attains the optimal rate of convergence () which is slower compared to the convergence rate of the kernel covariance estimator (). The kernel estimator in equation (5) is fairly easy to implement.

It is helpful to focus on the bivariate case in order to gain further understanding. We will look at the results for the assets and , whose logprices will be written as and respectively. Then the highfrequency returns at time is
In order to avoid the symmetric replication in the covariation matrix we employ a halfvectorization, or alternatively, a vech transformation. The halfvectorization of a symmetric matrix is obtained by vectorizing only the lower triangular part of the matrix (see Kollo and Rosen (2005), Lütkeohl (1996)). In this case Theorem 1 tells us that joint asymptotic distribution for identifying elements of realized covariation of two assets and becomes
3 Jump Case: Threshold Kernel Covariance Estimation
In this section we assume that the price process is governed by a discontinuous semimartingale with finite activity jumps. We propose a threshold kernel spot covariance estimator, which is an extension of the threshold kernel spot volatility estimator in Yu et al. (2014) to the multivariate case. Theorem 3 derives the asymptotic distribution of the threshold kernel covariance estimator for a fixed bandwidth.
Consider a filtered probability space . Let the ddimensional (with fixed ) logprice be defined on the this space and satisfy the following stochastic differential equation:
(13) 
where is the drift vector, is the instantaneous volatility matrix, is the dimensional standard Brownian motion and is a compound Poisson process with finite activity of jumps, which can be written as . Here is a homogeneous Poisson process with constant intensity and
is a sequence of i.i.d. random variables with values in
, which denotes the jump size at the jump location . We assume for are i.i.d. and independent of . Denote the dimensional spot covariance matrix by .Suppose that on a finite and fixed time horizon , we have highfrequency discrete observations of the realization of th asset, with . Here, is an arbitrary partition of the interval . Although the observations are not necessarily equidistant, we require that approaches zero under the asymptotic limit. We consider the case of equally spaced and synchronous observation times, though this assumption can easily be lifted. Denote , so that for .
The quantity of interest is the spot covariance matrix . The threshold kernel covariance estimator, denoted by , is defined as
(14) 
where is the indicator function and is the dimensional vector of increments of process over time interval . The function is given by , where is bandwidth and the kernel function satisfies . The threshold function is a deterministic function of the step length . As the bandwidth we recover the spot covariance. The threshold function has to vanish more slowly than the modulus of the continuity of the Brownian motion in order to have the convergence in probability. Thus we have the following additional assumption.
Assumption 6.
is a deterministic function of the step length such that and .
We now can derive the asymptotics of the threshold kernel covariance estimator.
Theorem 3.
Proof.
See Appendix C ∎
The threshold kernel covariance estimator in equation (3) is an extension of the threshold kernel estimator of the timedependent spot volatility in Yu et al. (2014) to the multivariate case. In Theorem 3 we derive asymptotic distribution for the estimator for a fixed bandwidth of the kernel. The similar results as in Theorem 3 was achieved for univariate case in Yu et al. (2014).
4 Simulation Study
In this section we examine the performance of the kernel and threshold kernel covariance estimators. In particular, we investigate the finitesample performances of the estimators relative to the time distance between observations. Throughout we work with bivariate stochastic volatility model. First, we examine the kernel covariance estimator in a setup without jumps and assume that asset prices, , follows Heston model:
(17) 
where
(18) 
with the covariance , the drift vector and a standard two dimensional Brownian motion such that . The variance processes, for , follow the CIR model Cox et al. (1985):
(19) 
We set the correlation between asset and its volatility process to zero in order for Assumption 1 to hold. The remaining data generating parameters are chosen to match the estimated parameter values in BarndorffNielsen and Shephard (2002). In our simulation we set (48 hours). We consider frequencies corresponding to sampling every 5 seconds, 20 seconds, 1 minute, 5 minutes and 10 minutes. In order to simulate the data using model (4) we employ the Euler discretization scheme from Kloeden and Platen (1999, ch.14). We simulate one trajectory of each for and keep them fixed. Then we run 500 Monte Carlo repetitions for prices of two assets . In each repetition we compute for based on sampling frequencies.
Three different estimators of instantaneous covariance: Gaussian kernel estimator, onesided kernel estimator and beta kernel estimator are implemented. For all three estimators crossvalidation was used to select the bandwidth (see Kristensen (2010)). We used the following integrated squared error (ISE) as the goodnessoffit criterion:
(20) 
where and for are the true and the estimated spot covariances. Two performance measurements are used to evaluate the finitesample properties of the estimators: the integrated mean squared error and the integrated bias
[b]
Gaussian kernel  Beta kernel  

Data Frequency  IMSE  ISB  IMSE  ISB  IMSE  ISB 
5 seconds  0.14  0.37  0.11  0.21  0.13  0.28 
20 seconds  0.73  0.63  0.43  0.49  0.66  0.46 
1 minute  0.80  0.74  0.59  0.71  0.76  0.69 
5 minutes  1.85  1.97  1.17  1.24  2.03  1.43 
10 minutes  3.88  4.21  2.16  2.14  2.85  3.16 

Note: Integrated mean squared error and integrated squared bias .
(21) 
where . The results for the performance of the estimator of the covariance, , are reported in Table 1. Figure 1
displays QQ plot for observed standardized error terms of Kernel Covariance Estimator using minutebyminute data.
Next, we examine the finite sample performance of the threshold covariance estimator. Though several models combining jumps and stochastic volatility appeared in the literature, we use the model from Bates (1996), one of the most popular examples of the class, an independent jump component is added to the Heston stochastic volatility model:
(22) 
with
(23) 
where is log of asset prices, , is the drift vector, is a two dimensional compound Poisson jump process and is a standard two dimensional Brownian motion such that . The variance processes, for , follow the CIR model:
[b]
Gaussian kernel  Beta kernel  

Data Frequency  IMSE  ISB  IMSE  ISB  IMSE  ISB 
5 seconds  1.76  1.38  1.25  1.22  2.34  1.75 
20 seconds  2.24  1.13  1.87  1.34  2.13  2.03 
1 minute  3.76  1.45  2.31  1.67  3.54  2.43 
5 minutes  9.35  1.67  7.31  1.35  3.52  6.67 
10 minutes  5.53  1.25  3.65  7.38  1.83  4.39 

Note: Integrated mean squared error and integrated squared bias .
(24) 
As in simulations for Heston model without jumps we set (48 hours) and consider sampling frequencies 5 seconds, 30 seconds, 1 minute. We employ Euler discretization scheme from Kloeden and Platen (1999, ch.14) for the simulation. We simulate one trajectory of each and for and keep them fixed. Then we run 500 repetitions of . For each simulated path of the bivariate log asset price we compute based on sampling frequencies.
We use two IMSE and ISB performance measurements in equation (21) for three different estimators: Gaussian, beta and onesided kernel estimator. The results for the performance of the estimator are reported in Table 2. Figure 2 displays QQ plot for observed standardized error terms of Threshold Kernel Covariance Estimator using minutebyminute data.
5 Applications: Covariance Forecasting
Forecasting covariance has an important economic value in the context of asset pricing and portfolio allocation. Multivariate GARCH model is a standard tool of modelling and forecasting covariances. However, the more recent approaches advocate the use of highfrequency data.
Symitsi et al. (2018)
undertake a comprehensive empirical comparison of two generic families of covariance forecasting models: multivariate GARCH models that employ daily data and models that use highfrequency and options data. The authors conclude that models based on highfrequency data offer both a clear advantage in terms of statistical accuracy and yield more theoretically consistent predictions leading to superior outofsample portfolio performance. In particular, a Vector Heterogeneous Autoregressive Model (VHAR) achieves the best performance out of the models under consideration. Motivated by this, we use the VHAR model to forecast the integrated covariance, however, when implementing for a finite sample, we use the kernel covariance estimator (
3) in Section 2 instead of the realized covariance estimator of BarndorffNielsen and Shephar (2004a).Heterogeneous Autoregressive model (HAR), see Corsi (2009), was proposed as a simple way to approximate the longmemory behaviour of volatility. Vector HAR, implemented in Chiriac (2011), is a multivariate extension of HAR. In the VHAR the realized covariance is expressed as a linear combination of past daily, weekly and monthly realized covariances:
(25) 
where is obtained from Cholesky decomposition of realized covariance matrix. If is a matrix of realized covariances, its Cholesky decomposition gives and then . In order to allow direct comparison among quantities defined over various time horizons, these multiperiod factors are normalized sums of the daily realized factors, i.e.
(26) 
is the past day values of , is a constant term and are, respectively, the parameters of daily, weekly and monthly components of the model. The covariance forecasts, , are obtained by the reverse transformations of the ’s. Modelling the Cholesky factors rather than covariances directly is done in order to avoid unnecessary restrictions that ensure positive definiteness.
We simulate the logprices of two assets and their volatilises using model (4) in Section 4. Since we use simulated data, we have the true integrated covariance matrix and we propose to forecast the true covariance matrix using two measures of integrated covariance: standard in the literature realized covariance estimator of BarndorffNielsen and Shephard (2002) and newly proposed kernel filtering of the covariance in equation (3). Thus we have two models for forecasting integrated covariance. First model is VHAR model where we use the realized covariance as a measure of integrated covariance:
(27) 
where is the halfvectorized Cholesky decomposition of the integrated covariance matrix.
[b]
1day horizon  1week horizon  2week horizon  

0.3243  0.3213  0.4987  0.4896  0.4124  0.4126  
0.6904  0.6064  0.2443  0.2032  0.2295  0.2175  
0.6909  0.6028  0.1765  0.1483  0.2257  0.1591  
0.8922  0.8374  0.9007  0.7289  0.5219  0.4328  
0.1267  0.0529  0.1831  0.0772  0.2412  0.1841  
0.1387  0.0546  0.1796  0.0797  0.2981  0.1902  
10.143  14.0537  9.893  13.2624  7.8503  11.5561 
In light of this it is natural to define the VHARKCV model, in which we borrow the VHAR model above to predict the integrated covariance matrix, however we use kernel covariance estimator:
(28) 
where is the halfvectorized Cholesky decomposition of the kernel covariance estimator in (3). We benchmark the VHARKCV against the VHAR.
In line with Symitsi et al. (2018) we evaluate forecasting ability of the the VHARKCV model (28) based on three multivariate loss functions and compare its performance to the performance of the benchmark VHAR model (27). We use the Euclidean loss function, , which is equallyweighted elements of the forecast error matrix; the Frobenius distance, , which is the extension of the mean squared error to the multivariate space and the multivariate quasilikelihood loss function, , which is scale invariant:
(29)  
(30)  
(31) 
Here denotes the trace of square matrix, denotes the integrated covariance matrix at time and is time matrix of conditional covariance forcasts.
6 Concluding Remarks
Inspired by the kernel filtering of spot volatility, in this paper we develop estimators of spot covariances for two types of the underlying price process: continuous and discontinuous semimartingales. We show the asymptotic normality of the estimators. An important result is that we are able to attain the convergence rate for both estimators, which is . The convergence rate of spot covariance matrix estimator for continuous martingales in a setup with microstructure noise proposed by Bibinger et al. (2017) is, in turn, . In financially realistic scenarios, we conduct Monte Carlo experiments to study the finite sample properties of our estimators. In addition, we investigate one of the possible applications of the estimator, the forecasting of covariance matrix. We conclude that our estimator performs better in the context of forecasting than the benchmark realized covariance estimator of BarndorffNielsen and Shephard (2004a). One of the possible extensions of the estimators is to consider a marketmicrostructure noise.
References
 AïtSahalia et al. (2010) AïtSahalia Y, Fan J, Xiu D. 2010. Highfrequency estimates with noisy and asynchronous financial data. Journal of the American Statistical Association 105: 15041516.
 Alexander et al. (2018) Alexander C. 2008. Market risk analysis (vol. 2): practical financial econometrics. Chichester: John Wiley & Sons.
 Andersen et al. (2013) Andersen TG, Bollerslev T, Christoffersen PF, and Diebold FX. 2013. Financial risk measurement for financial risk management. Handbook of the Economics of Finance 53: 11271220.
 BarndorffNielsen et al. (2004a) BarndorffNielsen OE, Shephard N. 2004a. Econometric analysis of realised covariation: High frequency based covariance, regression and correlation in financial economics. Econometrica 72: 885–925.
 BarndorffNielsen et al. (2002) BarndorffNielsen, OE, Shephard N. 2002. Econometric analysis of realized volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society 64: 253280.
 Bates et al. (1996) Bates D. 1996. Jumps and stochastic volatility: the exchange rate processes implicit in Deutschemark options. The Review of Financial Studies 9: 69107.
 Bibinger et al. (2017) Bibinger M, Hautsch N, Malec P, Reiss M. 2017. Estimating the spot covariation of asset prices — statistical theory and empirical evidence. Journal of Business and Economic Statistics. 15041516.
 Bibinger et al. (2014) Bibinger M, Reiss M. 2014. Spectral estimation of covolatility from noisy observations using local Weights. Scandinavian Journal of Statistics 6: 2350.
 Bibinger et al. (2015) Bibinger M, and Winkelmann L. 2015. Econometrics of cojumps in high frequency data with noise. Journal of Econometrics 184: 361378.
 Bos et al. (2012) Bos CS, Janus P, Koopman SJ. 2012. Spot variance path estimation and its application to highfrequency jump testing. Journal of Financial Econometrics 10: 354389.
 Chiriac et al. (2011) Chiriac R, Voev V. 2011. Modelling and forecasting multivariate realized volatility. Journal of Applied Econometrics 26: 922947.

Christensen et al. (2013)
Christensen K, Podolskij M, Vetter M. 2013. On covariation estimation for multivariate continuous Itô semimartingales with noise in nonsynchronous observation schemes.
Journal of Multivariate Analysis
120: 5984.  Cline et al. (1991) Cline DBH, Hart JD. 1991. Kernel estimation of densities with discontinuities or discontinuous derivatives. Statistics 22: 6984.
 Corsi et al. (2009) Corsi F. 2009. A simple approximate longmemory model of realized volatility. Journal of Financial Econometrics 7: 174196.
 Cox et al. (1985) Cox J, Ingersoll J, Ross S. 1985. A theory of the term structure of interest rates. Econometrica 53: 385407.
 Fan et al. (2008) Fan J, Wang Y. 2008. Spot volatility estimation for highfrequency data. Statistics and Its Interface 1: 279288.
 Foster et al. (1996) Foster DP, Nelson DB. 1996. Continuous record asymptotics for rolling sample variance estimators. Econometrica 64: 139174.
 Hayashi et al. (2011) Hayashi T, Yoshida N. 2011. Nonsynchronous covariation process and limit theorems. Stochastic processes and their applications 121: 24162454.
 Heston et al. (1993) Heston SL. 1993. A closedform solution for options with stochastic volatility with applications to bond and currency options. Review of Financial Studies 6: 327343.
 Hull et al. (1987) Hull J, White A. 1987. The pricing of options on assets with stochastic volatility. Journal of Finance 42: 281300.
 Kanaya et al. (2016) Kanaya S, Kristensen D. 2016. Estimation of stochastic volatility models by nonparametric filtering. Econometric Theory 32: 861916.
 Karatzas et al. (1999) Karatzas I, Shreve SE. 1999. Brownian motion and stochastic calculus. New York: Springer.
 Kloeden et al. (1999, ch.14) Kloeden P, Platen E. 1999. Numerical solutions of stochastic differential equations. Berlin: SpringerVerlag.
 Kollo et al. (2005) Kollo T, Rosen D. 2005. Advanced multivariate statistics with matrices. Dordrecht: Springer.
 Kristensen et al. (2010) Kristensen D. 2010. Nonparametric filtering of the realized spot volatility: A kernelbased approach. Econometric Theory 26: 60–93.
 Lutkeohl et al. (1996) Lütkeohl, H. 1996. Handbook of matrices. Chichester: John Wiley & Sons Ltd.
 Mykland et al. (2008) Mykland PA, Zhang L. 2008. Inference for Volatilitytype objects and implications for hedging. Statistics and Its Interface 1: 255278
 Revuz et al. (1998, ch.5) Revuz D, Yor M. 1998. Continuous martingales and Brownian motion. Berlin: SpringerVerlag.
 Silverman et al. (1986) Silverman BW. 1986. Density estimation for statistics and data analysis. New York: Chapman and Halls.
 Symitsi et al. (2018) Symitsi E, Symeonidis L, Kourtis A, Markellos R. 2018. Covariance forecasting in equity markets. Journal of Banking and Finance 96: 153168.
 Uhlenbeck et al. (1930) Uhlenbeck GE, Ornstein LS. 1930. On the theory of Brownian Motion. Phys.Rev 36: 82341.
 Yu et al. (2014) Yu C, Fang Y, Li Z, Zhao X. 2014. Nonparametric estimation of highfrequency spot volatility for Brownian semimartingale with jumps. Journal of Time Series Analysis 35: 572591.
 Zhang et al. (2011) Zhang L. 2011. Estimating covariation: Epps effect and microstructure noise. Journal of Econometrics 160: 3347.
 Zu et al. (2014) Zu Y, Boswijk HP. 2014. Estimating spot volatility with highfrequency financial data. Journal of Econometrics 181: 117135.
Appendix A Proof of Theorem 1
a.1 Notation
In a similar way to BarndorffNielsen and Shephard (2004a)
for the purpose of simplifying the proof we will use the index (or equivalently, tensor) notation instead of vector or matrix notation. We rewrite the
stochastic processes in equation (1) in index notation as(32) 
with initial condition . Here
In the index notation the Einstein summation convention is used, which means if an index variable appears twice in a single expression then it implies summation over that index. Thus (32) is understood to mean
(33) 
We apply summation convention to indices , but not to indices , unless otherwise specified. Furthermore, we write
(34) 
with similar notation for other index combination. In (34) no superscripts or subscripts are repeated and so no summation operator is generated. Combining the Einstein summation convention and the notional rule for , the th element of the spot covalatility matrix of model (1) is
(35) 
a.2 Mean and variances
The proof of Theorem 1 consists of several steps. First step is to derive the means and covariances of the variates
(36)  
(37) 
with .
Next, the Theorem 1 is proved for the case, where the mean processes are identically . Finally, the latter restriction is lifted. The proof is componentwise and based on the results and techniques employed by BarndorffNielsen and Shephard (2004a) and Kristensen (2010).
We start by computing the expectation of in equation (37).
(38)  
where the final equation is due to the results of BarndorffNielsen and Shephard (2004a):
(39) 
Next, we apply Lemma 5 and have