# Mixed-normal limit theorems for multiple Skorohod integrals in high-dimensions, with application to realized covariance

This paper develops mixed-normal approximations for probabilities that vectors of multiple Skorohod integrals belong to random convex polytopes when the dimensions of the vectors possibly diverge to infinity. We apply the developed theory to establish the asymptotic mixed normality of the realized covariance matrix of a high-dimensional continuous semimartingale observed at a high-frequency, where the dimension can be much larger than the sample size. We also present an application of this result to testing the residual sparsity of a high-dimensional continuous-time factor model.

## Authors

• 9 publications
• ### Covariance matrix testing in high dimension using random projections

Estimation and hypothesis tests for the covariance matrix in high dimens...
11/16/2020 ∙ by Deepak Nag Ayyala, et al. ∙ 0

• ### De-biased graphical Lasso for high-frequency data

This paper develops a new statistical inference theory for the precision...
05/04/2019 ∙ by Yuta Koike, et al. ∙ 0

• ### Spectral statistics of high dimensional sample covariance matrix with unbounded population spectral norm

In this paper, we establish some new central limit theorems for certain ...
04/07/2021 ∙ by Yanqing Yin, et al. ∙ 0

• ### Large-scale inference of correlation among mixed-type biological traits with Phylogenetic multivariate probit models

Inferring concerted changes among biological traits along an evolutionar...
12/19/2019 ∙ by Zhenyu Zhang, et al. ∙ 0

• ### Hierarchical Regularizers for Mixed-Frequency Vector Autoregressions

Mixed-frequency Vector AutoRegressions (MF-VAR) model the dynamics betwe...
02/23/2021 ∙ by Alain Hecq, et al. ∙ 0

• ### High-dimensional estimation of quadratic variation based on penalized realized variance

In this paper, we develop a penalized realized variance (PRV) estimator ...
03/04/2021 ∙ by Kim Christensen, et al. ∙ 0

• ### High-dimensional mixed-frequency IV regression

This paper introduces a high-dimensional linear IV regression for the da...
03/30/2020 ∙ by Andrii Babii, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Covariance matrix estimation of multiple assets is one of the most active research areas in high-frequency financial econometrics. Recently, many authors have been attacking the high-dimensionality in covariance matrix estimation from high-frequency data. A pioneering work on this topic is the paper by

Wang & Zou [68], where the regularization methods (banding and thresholding) proposed in Bickel & Levina [7, 6] have been applied to estimating high-dimensional quadratic covariation matrices from noisy and non-synchronous high-frequency data. Subsequently, their approach has been enhanced by several papers such as [65, 44, 42]. Meanwhile, such methods require a kind of sparsity of the target quadratic covariation matrix itself, which seems unrealistic in financial data in view of the celebrated factor structure such as the Fama-French three-factor model of [27]. To overcome this issue, Fan et al. [28] have proposed a covariance estimation method based on a continuous-time (approximate) factor model with observable factors, which can be seen as a continuous-time counterpart of the method introduced in Fan et al. [31]. The method has been further extended in various directions such as situations with unobservable factor, noisy and non-synchronous observations, heavy-tail errors and so on; see [1, 23, 43, 29, 59] for details. As an alternative approach to avoid assuming the sparsity of the target matrix itself, Brownlees et al. [9] have proposed applying the graphical Lasso, which imposes the sparsity on the inverse of the target matrix rather than the target matrix itself. On the empirical side, high-dimensional covariance matrix estimation from high-frequency financial data is particularly interesting in portfolio allocation. We refer to [67, 30, 49] for illustrations of relevant empirical work on this topic, in addition to the empirical results reported in the papers cited above.

To the best of the author’s knowledge, however, there is no work to establish a statistical inference theory validating simultaneous hypothesis testing and construction of uniformly valid confidence regions for high-dimensional quadratic covariation estimation from high-frequency data. Such a theory is important in statistical applications as illustrated by the following example: Let be a -dimensional continuous semimartingale. We denote by the -th component of for every . If one attempts to apply a regularization procedure to estimating the quadratic covariation matrix of , it is important to understand whether the target matrix is really sparse or not, and if so, how sparse it is. This amounts to evaluating the following series of the statistical hypotheses simultaneously:

 H(i,j):[Yi,Yj]1≡0,i,j=1,…,d such that i

A natural way to construct test statistics for this problem is to estimate

and test whether each of the entries is significantly away from 0 or not. Now suppose that is observed at the equidistant times , . Then the most canonical estimator for would be the so-called realized covariance matrix:

 ˆ[Y,Y]n1:=n∑h=1(Yth−Yth−1)(Yth−Yth−1)⊤. (1.2)

If one wants to test the null hypothesis such that all the hypotheses in (

1.1) is true, it is natural to consider the maximum type statistic

 max(i,j)∈Λ∣∣ˆ[Y,Y]n1∣∣,

where . More generally, if one wants to control the family-wise error rate in multiple testing for the hypotheses (1.1), it is enough to approximate the distribution of for any , with the help of the stepdown procedure illustrated in Romano & Wolf [61]. Hence the problem amounts to approximating the distributions of such maximum type statistics in an appropriate sense. Using the test statistics considered in Bibinger & Mykland [5], this type of testing problem can be extended to the sparsity test for the residual processes of a continuous-time factor model with an observable factor and

thus promising in applications to high-frequency financial data. In addition, such a problem will also be useful for covariance matrix modeling in a low-frequency setting because it often suffers from the curse of dimensionality due to the increase of the number of unknown parameters to be estimated, and thus it is a common practice to impose a certain structure on covariance matrices for reducing the number of unknown parameters in models. For example,

Tao et al. [64] have proposed fitting a matrix factor model to daily covariance matrices which are estimated from high-frequency data using the methodology of [68], while Kurose & Omori [46, 47] have introduced a dynamic (multiple-block) equicorrelation structure to multivariate stochastic volatility models. The afore-mentioned testing will be useful for examining the validity of such specification. If the dimension is fixed, the desired approximation can be obtained as a simple consequence of a multivariate mixed-normal limit theorem for , which is well-studied in the literature and holds true under quite mild assumptions; see e.g. Theorem 5.4.2 of [35]. The problem here is how to establish an analogous result when the dimension possibly diverges as tends to infinity.

Indeed, even for the sum of independent random vectors, it is far from trivial to establish such a result in a situation where the dimension is possibly (much) larger than the sample size. This is not surprising because objective random vectors are typically not tight in the usual sense in such a high-dimensional setting, so any standard method to establish central limit theorems no longer works. A significant breakthrough in this subject was achieved by the seminal work of

Chernozhukov, Chetverikov & Kato [14], where a Gaussian approximation of the maxima of the sum of independent random vectors in terms of the Kolmogorov distance has been established under quite mild assumptions which allow the dimension is (possibly exponentially) larger than the sample size. With the help of the Gaussian comparison theorem by Chernozhukov et al. [17], it enables us to construct feasible statistical inference procedures based on the maximum type statistics. Their theory, which we call the Chernozhukov-Chetverikov-Kato theory, or the CCK theory for short, has been developed in the subsequent work by Chernozhukov et al. [15, 18] and Chernozhukov et al. [19]: the first two papers have developed Gaussian approximation of the suprema of empirical processes, while the latter has extended the results of [14] to a central limit theorem for hyperrectangles, or sparsely convex sets in more general. Extension of the CCK theory to statistics other than the sum of independent random vectors has also been studied in many articles: Weakening the independence assumption has been studied in e.g. [70, 71, 16, 10]; Chen [11] and Chen & Kato [12, 13] have developed theories for -statistics. Moreover, some authors have applied the CCK theory to statistical problems regarding high-frequency data; see Kato & Kurisu [40] and Koike [45]. Nevertheless, none of the above studies is applicable to our problem due to its non-ergodic nature. That is, the asymptotic covariance matrix is random and depends on the -filed of the original probability space, so the asymptotic distribution is essentially non-Gaussian.

Meanwhile, inspection of the proofs of the CCK theory reveals that most the parts do not rely on any structure of the underlying statistics. To be precise, let be the random vector corresponding to the objective statistic and suppose that we aim at approximating the distribution of by its Gaussian analog which has the same mean and covariance matrix as those of . In the proofs of the CCK theory, the fact that is the sum of independent random vectors is crucial only to obtain a good quantitative estimate for the quantities for sufficiently smooth functions . In the original CCK theory [14, 19], such an estimate has been established by the so-called Stein’s method, especially Slepian’s interpolation (also known as the smart path method) and Stein’s leave-one-out method. Although their approach is not directly applicable to our problem, it suggests that we might alternatively use Malliavin’s integration by parts formula because it can be viewed as an infinite-dimensional version of Stein’s identity (cf. Sakamoto & Yoshida [63]). In fact, the recent active research in probabilistic literature shows a beautiful harmony between Malliavin calculus and Stein’s method, which is nowadays called the Malliavin-Stein method; we refer to the monograph [54] for an introduction of this subject. Indeed, this idea has already been applied in [45] to a situation where is a vector of smooth Wiener functionals (especially multiple Wiener-Itô integrals) and is Gaussian, which has produced several impressive results. Our plan here is to apply this idea to a situation where is a vector of multiple Skorohod integrals and is conditionally Gaussian. In this regard, a relevant result has been given in Theorem 5.1 of Nourdin et al. [53]. However, this result is not directly applicable to the current situation because it assumes that the components of are conditionally independent, which is less interesting to statistical applications (and especially not the case in the problem illustrated above). To remove such a restriction from the result of [53]

, we employ the novel interpolation method introduced in

Nualart & Yoshida [57], instead of Slepian’s interpolation used in [53] and the original CCK theory.

Another problem in the present context is validation of standardizing statistics by random variables. In a low-dimensional setting, this is typically achieved by proving the so-called

stable convergence in law (see e.g. [60] for details). However, in a high-dimensional setting, the meaning of stable convergence is unclear and its naïve extension is not useful because of the lack of the continuous mapping theorem and the delta method (see Section 3 for a relevant discussion). So we also aim at developing a formulation appropriate to validating such an operation.

The remainder of the paper is organized as follows. Section 2 is devoted to some preliminaries on notation and concepts used in the paper. Section 3 presents the main results obtained in this paper. In Section 4 we apply the developed theory to establish the asymptotic mixed normality of realized covariance matrices in a high-dimensional setting and illustrate its application to testing the residual sparsity of a continuous-time factor model. Section 5 provides a small simulation study as well as an empirical illustration using real data. All the proofs are collected in the Appendix.

## 2 Preliminaries

In this section we present some notation and concepts used throughout the paper.

### 2.1 Basic notation

We begin by introducing some basic notation which is more or less common in the literature. For a vector , we write the -th component of as for . Also, we set . For two vectors , the statement means for all . For a vector and a scalar , we set

 x±a:=(x1±a,…,xd±a)⊤.

Here, stands for the transpose of a matrix.

For a matrix , we write its -th entry as . Also, and denote the -th row vector and the -th column vector, respectively. Here, we regard both the vectors and as column vectors. If is an matrix, we denote by the -operator norm of :

If is another matrix, we denote by the Frobenius inner product of and . That is,

 A⋅B:=m∑i=1d∑j=1AijBij.

For a matrix , we denote by the -dimensional vector consisting of the diagonal entries of , i.e. .

For a random variable and a number , we write . We also use the notation to denote the essential supremum of . We will denote by the space of all random variables such that for every . The notation stands for convergence in probability.

If is a real Hilbert space, we denote by and the inner product and norm of , respectively. Also, we denote by the set of all -valued random variables such that .

Given real Hilbert spaces

, we write their Hilbert space tensor product as

. For a real Hilbert space , we write the th tensor power of as , i.e.

 V⊗k:=V⊗⋯⊗Vk.

Note that the Hilbert space tensor product is uniquely determined up to isomorphism, and we often select a convenient realization case by case. For example, we identify the tensor product with the Hilbert space equipped with the inner product for . This is possible because the latter is the Hilbert space tensor product of and in the sense of Definition E.8 in [37]. Namely, there is a bilinear map such that the range of is total in and

 ⟨T(f,a),T(g,b)⟩Vd=⟨f,g⟩V⟨a,b⟩Rd

for all and . In fact, we may define by

Evidently, is bilinear and its range is total in . Moreover, for any and , ,

 ⟨T(f,a),T(g,b)⟩Vd=d∑i=1⟨aif,big⟩V=d∑i=1aibi⟨f,g⟩V=⟨f,g⟩V⟨a,b⟩Rd.

For an element , we write the (canonical) symmetrization of as . Namely, the map is characterized as the unique continuous linear operator on such that

 Sym(f1⊗⋯⊗fk)=1k!∑τ∈Skfτ(1)⊗⋯⊗fτ(k)

for all , where denotes the set of all permutations of , i.e. the symmetric group of degree . An element is said to be symmetric if . We refer to Appendix E of [37] for details on Hilbert space tensor products.

### 2.2 Multi-way arrays

In this subsection we introduce some notation related to multi-way arrays (or tensors) which are necessary to state our main results.

Given a positive integer , we set for short. We denote by the real field or the complex field and consider a vector space over . Given positive integers , we denote by the set of all -valued arrays, i.e. -valued functions on . Note that corresponds to the set of all -valued matrices. When , we call an element of a -valued -dimensional -way array. For an array and indices (), we write as and itself as . When , is naturally identified with the Hilbert space tensor product by the unique linear isomorphism such that for , (cf. Example E.10 of [37]).

For two -valued arrays , we define their Hadamard-type product (i.e. entry-wise product) by

 S∘T:=(Si1,…,iqTi1,…,iq)(i1,…,iq)∈∏qk=1[Nk]∈KN1×⋯×Nq.

Also, we set

 ∥T∥ℓp:=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩⎧⎪⎨⎪⎩∑(i1,…,iq)∈∏qk=1[Nk]|Ti1,…,iq|p⎫⎪⎬⎪⎭1/pif p∈(0,∞),max(i1,…,iq)∈∏qk=1[Nk]|Ti1,…,iq|if p=∞.

Now suppose that is a real Hilbert space. For and , we define

 ⟨T,x⟩V:=(⟨Ti1,…,iq,x⟩V)(i1,…,iq)∈∏qk=1[Nk]∈RN1×⋯×Nq. (2.1)

Let be a positive integer. For each , let be a real Hilbert space, , and . Then we define

 T1⊗⋯⊗Tm :=(Ti1,…,ip11⊗⋯⊗Tip1+⋯+pm−1+1,…,ip1+⋯+pm)(i1,…,ip1+⋯+pm)∈∏p1+⋯+pmk=1[Nk] ∈(V1⊗⋯⊗Vm)N(1)1×⋯×N(1)p1×⋯×N(m)1×⋯×N(m)pm. (2.2)

In particular, we write

 T⊗m:=T⊗⋯⊗Tm.

### 2.3 Malliavin calculus

This subsection introduces some notation and concepts from Malliavin calculus used throughout the paper. We refer to Nualart [55], Chapter 2 of Nourdin & Peccati [54] and Chapter 15 of Janson [37] for further details on this subject.

Given a probability space , let be an isonormal Gaussian process over a real separable Hilbert space .

Let be another real separable Hilbert space. For any real number and any integer , denotes the stochastic Sobolev space of -valued random variables which are times differentiable in the Malliavin sense and the derivatives up to order

have finite moments of order

. If , we denote by the th Malliavin derivative of , which is a random variable taking its values in the space . We write instead of for short. We set . If , we simply write as .

For a -dimensional random vector , we identify the th Malliavin derivative of as the -valued random variable by identifying with as in Section 2.1. Similarly, for a matrix valued random variable , we identify as the -valued random variable .

For a positive integer , we denote by the -th multiple Skorohod integral, which is the adjoint operator of the densely defined operator . That is, the domain of is defined as the set of all -valued random variables such that there is a constant satisfying for all , and the following duality formula holds for any and :

 E[Fδq(u)]=E[⟨u,DqF⟩H⊗q].

### 2.4 Multi-indices

This subsection collects some notation related to multi-indices.

Let be a positive integer. We denote by the set of all non-negative integers. We define

 A(q):={α∈Zq+:α1+2α2+⋯+qαq=q}.

For a multi-index , we set as usual. Given another positive integer , we define

 Nr(α):={ν=(νij)(i,j)∈[q]×[r]:νij∈Z,r∑j=1νij=αi}

and

 N∗r(α)={ν=(νij)∈Nr(α):νq1=0}.

Moreover, we define

 ¯¯¯¯¯A(q):=q⋃p=1A(p)and¯¯¯¯¯N∗r(q):=⋃α∈¯¯¯¯A(q)N∗r(α).

Finally, for an element , we set and .

## 3 Main results

Throughout the paper, we consider an asymptotic theory such that the parameter tends to infinity. For each , we consider a probability space , and we suppose that all the random variables at stage are defined on . We also suppose that an isonormal Gaussian process over a real separable Hilbert space is defined on . To keep the notation simple, we subtract the indices from , and , respectively. So we will write them simply as , and , respectively. In particular, note that the spaces and the operators associated with (which are introduced in Section 2.3) implicitly depend on , although we do not attach the index to them.

For each , let be a -dimensional random vector consisting of multiple Skorohod integrals:

 Mjn=δqj(ujn),j=1,…,d,

where is a positive integer and for every . Here, we assume that the dimension possibly depends on as , while ’s do not depend on . We also assume for every and . Our aim is to study mixed-normal limit theorems for the following functionals:

 Zn=Mn+Wn,n=1,2,…,

where ’s are -dimensional random vectors which represent the uncentered part of the functionals.

Let us introduce mixed-normal random vectors approximating the functionals in law as follows:

 Zn=C1/2nζn+Wn,n=1,2,….

Here, is a

symmetric positive semidefinite random matrix and

is a -dimensional standard Gaussian vector independent of , which is defined on an extension of the probability space if necessary.

The main aim of this paper is to investigate reasonable regularity conditions under which the distribution of is well-approximated by that of . To be precise, we are interested in the following type of result:

 supz∈Rd|P(Zn≤z)−P(Zn≤z)|→0as n→∞.

It is well-recognized in statistic literature, however, that this type of result is usually insufficient for statistical applications because it does not ensure standardization by a random vector which is still random in the limit; such an operation is crucial for Studentization in the present context. In a low-dimensional setting, this issue is usually resolved by proving the stability of the convergence so that

 (Zn,X)→L(Zn,X)as n→∞

for any -dimensional (-measurable) random variable , where denotes the convergence in law. This statement is no longer meaningful in a high-dimensional setting such that as , so we need to reformulate it appropriately. A naïve idea is to consider the following statement:

 supz∈Rd,x∈Rm|P(Zn≤z,X≤x)−P(Zn≤z,X≤x)|→0as n→∞. (3.1)

However, if depends also on , this type of statement is not attractive neither theoretical nor practical points of view due to the following reasons: From a theoretical point of view, we need to assume a so-called anti-concentration inequality for to prove this type of result by the CCK approach, but it is usually hard to check such an inequality for general random variables, especially when as . Besides, from a practical point of view, it is still unclear whether the convergence (3.1) ensures the validity of standardization of because no analog of the continuous mapping theorem has been established yet for high-dimensional central limit theorems of the form (3.1). For these reasons we choose the way to directly prove convergence results for normalized statistics of . More formally, let be an random matrix, where possibly depends on . Our aim is to establish

 supy∈Rm|P(ΞnZn≤y)−P(ΞnZn≤y)|→0 (3.2)

as under reasonable regularity conditions on and . Mathematically speaking, given a vector , the set

is a finite intersection of hyperplanes in

, i.e. convex polytopes in , so the convergence (3.2) can be considered as a high-dimensional central limit theorem for random convex polytopes. If we take as the

diagonal matrix whose diagonals are the inverses of the “standard errors” of

, the convergence (3.2) does ensures the validity of (marginal) standardization of .

Now, our main theorem is stated as follows:

###### Theorem 3.1.

Suppose that and and that is symmetric for all and . Suppose also that can be written as with being an (deterministic) matrix such that and . Assume that the following convergences hold true:

 |||Υn|||2∞E[∥Xn∥2ℓ∞∥Δn∥ℓ∞](logm)2→0 (3.3)

and

 |||Υn||||ν|∗∗+1∞E[(1+∥Xn∥|ν|∗+1ℓ∞)(1+∥Zn∥|ν⋅4|ℓ∞+∥Zn∥|ν⋅4|ℓ∞)max1≤j≤d∥Δn,j(ν)∥ℓ∞](logm)32|ν|∗∗+12→0 (3.4)

as for every , where

 Δn=(⟨DqjMin,ujn⟩H⊗qj−Cijn)1≤i,j≤d (3.5)

and

 Δn,j(ν):=⟨qj⨂k=1(DkMn)⊗νk1⊗(DkCn)⊗νk2⊗(DkWn)⊗νk3⊗(DkXn)⊗νk4,ujn⟩H⊗qj (3.6)

if and otherwise. Assume also that the following condition is satisfied:

 limb↓0limsupn→∞P(mindiag(ΞnCnΞ⊤n)

Then we have (3.2) as .

###### Remark 3.1.

The variable defined in (3.5) is called the quasi-tangent in [57].

###### Remark 3.2.

The variable defined in (3.6) takes values in

 Rd×⋯×d|ν|∗×m×⋯×m|ν⋅4|×d×⋯×d|ν⋅4|

when . To see this, let us recall that , , and take values in , , and , respectively (cf. Section 2.3). Therefore, according to the notation defined by (2.2), the variable

 qj⨂k=1(DkMn)⊗νk1⊗(DkCn)⊗νk2⊗(DkWn)⊗νk3⊗(DkXn)⊗νk4

takes values in

 (H⊗∑qjk=1k(νk1+νk2+νk3+νk4))d×⋯×d∑qjk=1(νk1+2νk2+νk3)×m×⋯×m∑qjk=1νk4×d×⋯×d∑qjk=1νk4=(H⊗qj)d×⋯×d|ν|∗×m×⋯×m|ν⋅4|×d×⋯×d|ν⋅4|,

where the last identity follows from the relation . Hence, according to the notation defined by (2.1), we obtain

 Δn,j(ν) ∈Rd×⋯×d|ν|∗×m×⋯×m|ν⋅4|×d×⋯×d|ν⋅4|.
###### Remark 3.3.

In Theorem 3.1, we require all the variables appearing there to have finite moments of all orders just for simplicity. It would be enough for them to have finite moments up to order only, where would be a function of .

###### Remark 3.4 (Quantitative bound).

As in the original CCK theory, it is possible to give a quantitative version of the convergence (3.2), but we do not implement it here to make the statement of the theorem simpler.

Let us write down conditions (3.3)–(3.4) in the special case that for all . In this case, setting for , we can rewrite these conditions as follows:

 |||Υn|||3∞maxq=1,2E[(1+∥Xn∥3ℓ∞)max1≤i,j≤dmaxk∈Jq∣∣⟨DqCijn,ukn⟩H⊗q∣∣](logm)72→0, |||Υn|||2∞maxq=1,2E[(1+∥Xn∥2ℓ∞)max1≤i≤dmaxj∈Jq∣∣⟨DqWin,ujn⟩H⊗q∣∣](logm)2→0, |||Υn|||2∞maxq=1,2E[(1+∥Zn∥ℓ∞+∥Zn∥ℓ∞)max1≤i≤mmax1≤j≤dmaxk∈Jq∣∣⟨DqXijn,ukn⟩H⊗q∣∣](logm)2→0, |||Υn|||3∞E[(1+∥Xn∥3ℓ∞)max1≤i,j≤dmaxk∈J2∣∣⟨DFin⊗DGjn,ukn⟩H⊗2∣∣](logm)72→0, |||Υn|||5∞E[(1+∥Xn∥5ℓ∞)max1≤i,j,k,l≤dmaxh∈J2∣∣⟨DCijn⊗DCkln,uhn⟩H⊗2∣∣](logm)132→0, |||Υn|||3∞E[(1+∥Zn∥2ℓ∞+∥Zn∥2ℓ∞)max1≤i,k≤mmax1≤j,l≤dmaxh∈J2∣∣⟨DXijn⊗DXkln,uhn⟩H⊗2∣∣](logm)72→0, |||Υn|||4∞E[(1+∥Xn∥4ℓ∞)max1≤i,j,k≤dmaxl∈J2∣∣⟨DCijn⊗DFkn,uln⟩H⊗2∣∣](logm)5→0, |||Υn|||3∞E[(1+∥Xn∥2ℓ∞)(1+∥Z