# Testing and Support Recovery of Correlation Structures for Matrix-Valued Observations with an Application to Stock Market Data

Estimation of the covariance matrix of asset returns is crucial to portfolio construction. As suggested by economic theories, the correlation structure among assets differs between emerging markets and developed countries. It is therefore imperative to make rigorous statistical inference on correlation matrix equality between the two groups of countries. However, if the traditional vector-valued approach is undertaken, such inference is either infeasible due to limited number of countries comparing to the relatively abundant assets, or invalid due to the violations of temporal independence assumption. This highlights the necessity of treating the observations as matrix-valued rather than vector-valued. With matrix-valued observations, our problem of interest can be formulated as statistical inference on covariance structures under matrix normal distributions, i.e., testing independence and correlation equality, as well as the corresponding support estimations. We develop procedures that are asymptotically optimal under some regularity conditions. Simulation results demonstrate the computational and statistical advantages of our procedures over certain existing state-of-the-art methods. Application of our procedures to stock market data validates several economic propositions.

## Authors

• 55 publications
• 11 publications
• 41 publications
• 8 publications
• 103 publications
• 3 publications
• ### Recovery of spectrum from estimated covariance matrices and statistical kernels for machine learning and big data

In this paper we propose two schemes for the recovery of the spectrum of...
04/25/2018 ∙ by Saba Amsalu, et al. ∙ 0

• ### Global Stock Market Prediction Based on Stock Chart Images Using Deep Q-Network

We applied Deep Q-Network with a Convolutional Neural Network function a...
02/28/2019 ∙ by Jinho Lee, et al. ∙ 0

• ### Testing new property of elliptical model for stock returns distribution

Wide class of elliptically contoured distributions is a popular model of...
07/24/2019 ∙ by Petr Koldanov, et al. ∙ 0

• ### Autoregressive Models for Matrix-Valued Time Series

In finance, economics and many other fields, observations in a matrix fo...
12/21/2018 ∙ by Rong Chen, et al. ∙ 0

• ### The Applications of Graph Theory to Investing

How can graph theory be applied to investing in the stock market? The an...
02/02/2019 ∙ by Joseph Attia, et al. ∙ 0

• ### Economic Power, Population, and the Size of Astronomical Community

The number of astronomers for a country registered to the IAU is known t...
08/02/2019 ∙ by Sang-Hyeon Ahn, et al. ∙ 0

• ### Estimating FARIMA models with uncorrelated but non-independent error terms

In this paper we derive the asymptotic properties of the least squares e...
10/16/2019 ∙ by Yacouba Boubacar Maïnassara, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Understanding the covariance matrix of asset returns is of paramount importance as asset pricing theories dictate that the distribution of returns are related to the business cycle and consumption states, which affect the demands for holding financial assets and generate time-varying risk premia (Moskowitz, 2003). However, characterizing the covariance structure of returns can be challenging, particularly when the number of assets is large. This also creates an acute problem for investors trying to minimize their portfolio risk, when the sample covariance matrix can not be inverted.

We now take the perspective of a global investor and consider an even more difficult challenge that the stock returns are from various industries in multiple countries across the world. To enable estimation, we employ a country-industry Kronecker structure model given the short sample period relative to the tremendous cross section of asset returns. However, cautions need to be taken, as these countries can be categorized into emerging markets and developed countries, assuming the same covariance matrix of industry returns for these two groups could lead to undesirable consequences in making optimal investment decision. Indeed, economic and finance theories suggest that industry returns may co-vary differently across the two groups of countries. As pointed out by Bekaert and Harvey (1995), developed markets are more financially integrated while emerging markets are more financially segmented, thus industries may have different amount of systematic risk depending on the level of segmentation. In other words, industries comove with the market, an aggregation of all industries, to different degrees, in emerging markets and developed countries.

As the market return is an aggregation of all industries returns, therefore an industry’s systematic risk is simply the value-weighted average of its covariance with other industries, over its own variance. Therefore, we focus on the correlation matrices and propose that they should be significantly different between the two groups of countries. Specifically, emerging markets are characterized by frequent regime switches, and sudden changes of fiscal, monetary and trade policies

(Aguiar and Gopinath, 2007). When these economic policies change frequently in an unanticipated way in emerging markets, they tend to make cyclical sectors more pro-cyclical than those in developed markets. Furthermore, Kohn et al. (2018) show that emerging economies produce more commodities than they consume while developed markets do not. Therefore we also expect to detect larger comovements of commodity industries returns with others in emerging countries. Finally, due to different demographic patterns (DellaVigna and Pollet, 2007), we also expect in emerging countries, recreative business industries co-vary more with the market returns.

Combined, these propositions all highlight the necessity of rigourous statistical testing of the equality of the correlation matrices from the two groups of countries. In the literature, researchers often treat the returns of multiple industries as vector-valued observations over time in each country (Fama and French, 1997; Hong et al., 2007). With the vector-valued approach, as typically the number of assets far exceeds the length of the time series and the number of countries, estimating the correlation matrix can be so challenging that certain potentially problematic assumption of the temporal independence must be made. In this case, adopting the vector-based approach of two sample test, such as Li and Chen (2012); Cai et al. (2013a); Cai and Zhang (2016); Chang et al. (2017); Zheng et al. (2019), will be either infeasible due to the small number of observations, or invalid due to the violation of temporal independence.

The goal of this article is to perform hypothesis test of the equality of the two correlation matrices by considering observations that are matrix-valued. Compared to the conventional vector-valued observations, the matrix-valued observations add one more dimension, corresponding to the time domain. The temporal dimension is allowed to have a wide range of dependence, which is more flexible than in the vector-based approach. The addition of the temporal dimension also alleviates the problem of small sample size and insufficient length of the time series as seen later.

To be specific, let correspond to the two groups of countries, emerging and developed, respectively. There are (resp. ) countries in the emerging (resp. developed) group. Denote , for , the matrix of returns for country in group . Each matrix is of size when there are industries and time points (in the application we consider later, there are months). We are interested in the inference on the correlation matrix , where is the vectorization operation that stacks the columns of a matrix into a long vector. As the correlation matrix is of enormous size while the sample size is only , it is difficult to estimate or make inference unless further assumption is considered.

A common and intuitive assumption of the matrix-valued observation is the matrix normal distribution, where the covariance matrix has the Kronecker product structure, that is, ; see for example Leng and Tang (2012); Yin and Li (2012); Zhou et al. (2014); Qiu et al. (2016); Han et al. (2016); Zhu and Li (2018). Here, is the covariance matrix of size for the covariances between industries in group and is the covariance matrix along the temporal dimension in group . This Kronecker product reduces the number of unknown parameters from to . Furthermore, much smaller sample size is needed: without the Kronecker product structure, a sample size of is necessary to make the sample covariance matrix full rank, while with the structure, the sample size is sufficient as long as and . Moreover, the Kronecker structure has been widely adopted in the asset pricing literature, such as conditional factor models of Brandt and Santa-Clara (2006); Brandt et al. (2009). We further verify the Kronecker product assumption in the stock market application via the hypothesis testing method of Aston et al. (2017). See Section 6.2 for the details.

Under such matrix normal assumption, our goal is to test the equality of the correlation matrices of the industries by considering as nuisance parameters. Consider the correlation matrices: , for , where is the diagonal matrix consisting of the diagonal entries of . As such, we test

 H0:R(1)A=R(2)A\rm versus H1:R(1)A≠R(2)A. (1)

This is referred to as the two-sample hypothesis test of the equality of the correlation matrices of the two groups with matrix-valued observations.

Furthermore, it is also of interest to test whether the columns of the matrix-valued observations are independent. This is important because, if indeed there is no temporal correlation, then the vector-based approach can be implemented. This goal can be achieved by testing

 H0,B,g:B(g)\rm is diagonal versus H1,B,g:B(g)\rm is not diagonal, (2)

within group . Similarly, independence of the industries within either group can also be tested via is diagonal. These are referred to as the one-sample hypothesis test of the independence of the columns, or rows, respectively, of the matrix-valued observations.

Moreover, when the null hypothesis of the one-sample hypothesis test is rejected, it is of further interest to identify which months or which industries have non-zero correlations; similarly, when the null hypothesis of the two-sample hypothesis test is rejected, it is important to further identify which industries have significantly different correlations among emerging countries versus developed countries. These are referred to as

support recovery problems.

In our real data application, we employ a comprehensive sample of 30 industry sector returns from 43 countries around the world from 2001:072017:12. As a prelude, the one-sample null hypothesis, is diagonal, is rejected by our method introduced in Section 2, suggesting the existence of significant temporal correlation. This implies that we can not use the aforementioned vector-based two-sample tests, and manifests the need of developing a method for the two-sample hypothesis test directly using matrix-valued observations (Section 3). According to our analysis (Section 6), the two-sample null hypothesis, , is also rejected, so we indeed identify significant differences in correlations across the two groups of countries. Furthermore, our support recovery analysis finds consistent evidence with existing economic propositions.

For vector-valued observations, there have been numerous efforts in the estimation and inference on the covariance/correlation/precision matrix. From the aspect of estimation, a good number of methods were proposed to estimate the covariance/correlation matrix for the vector case (Bickel and Levina, 2008; Rothman et al., 2009; Cai and Liu, 2011; Cai et al., 2012; Han and Liu, 2013; Cai and Zhang, 2016, e.g.). Meanwhile, various methods of estimating the precision matrix have also been proposed (Meinshausen et al., 2006; Yuan and Lin, 2007; Friedman et al., 2008; Ravikumar et al., 2011; Cai et al., 2011, e.g.), and some other works extend the single precision matrix to multiple precision matrices (Danaher et al., 2014; Zhu et al., 2014; Cai et al., 2016, e.g.). From the other aspect of inference, hypothesis testing procedures for vector data have been developed recently. In particular, Cai and Jiang (2011); Li and Chen (2012); Cai et al. (2013a, b); Cai and Zhang (2016); Chang et al. (2017); Zheng et al. (2019), for example, considered the one-sample or two-sample covariance/correlation matrix testing problem in high-dimensions. To investigate the graphical models, Liu et al. (2013) and Xia et al. (2015), for example, proposed procedures to test the property of the precision matrix under one-sample or two-sample settings.

For matrix-valued observations, matrix normal distribution, where the covariance matrix has the Kronecker product structure, has been frequently assumed. Under the matrix normal distribution, most of the existing works focus on the precision matrix, from either estimation or testing perspective. For instance, to inspect the graph structure, Leng and Tang (2012); Yin and Li (2012); Zhou et al. (2014) proposed methods to estimate the precision matrix; Qiu et al. (2016); Han et al. (2016); Zhu and Li (2018) extended further to the joint estimation of multiple precision matrices; and Xia and Li (2017, 2018) studied the one-sample and two-sample hypothesis testing of the structure of the precision matrices.

However, to the best of our knowledge, with respect to the covariance or correlation matrix for matrix-valued observations, the literature on either estimation or inference is rather scarce. Table 1 summarizes the status of literature on the hypothesis testing for both vector-valued and matrix-valued data under one-sample and two-sample regimes. This article will fill in the blank of hypothesis testing of correlation structures under the matrix normal assumption in both one-sample and two-sample cases. The matrix-valued covariance matrix estimation problem is a promising future direction.

Matrix-valued or tensor-valued data are ubiquitous nowadays. When dealing with such data, and sometimes even vector-valued data, Kronecker product structure has been a powerful tool because of its ability to approximate an arbitrary matrix

(Cai et al., 2019) and reduce dimensionality. Hafner et al. (2019) used Kronecker product to approximate the covariance matrix for vector-valued data and aimed to estimate the approximated covariance matrix. Chen et al. (2018)

investigated matrix autoregressive models where the coefficient matrix has Kronecker product structure. For tensor-valued time series,

Wang et al. (2019); Chen et al. (2019a); Chen and Chen (2019); Chen et al. (2019b) assumed that the tensor factor model has a signal that exhibits Kronecker structure. Aston et al. (2017); Constantinou et al. (2017) performed a test of the separability of terms in the Kronecker product. Molstad and Rothman (2019) proposed an algorithm to fit the linear discriminant analysis model with Kronecker product. These articles demonstrate a wide range of applications in finance, economics, engineering, neuroimaging, geophysics, and many more.

The rest of the article is organized as follows. Section 2 is devoted to the one-sample global hypothesis testing on the independence of the columns or the rows of matrix-valued observations and the recovery of the dependent entries when the global hypothesis test is rejected. Section 3 is dedicated to the two-sample global hypothesis testing of the equality of two correlation matrices along one dimension of the matrix-valued observations (in two groups), and the support recovery of the difference of the two correlation matrices. Section 4 establishes the theoretical properties of these procedures for both one-sample and two-sample settings. The numerical comparison of our procedures with existing ones via simulation is provided in Section 5 and the real data analysis of the aforementioned stock returns data is given in Section 6. The proofs are delegated to Appendix.

## 2 One-Sample Testing of Independence

To formulate the stock return example in terms of the matrix-valued two-sample hypothesis testing problem as introduced in (1), we shall first check whether the independence assumption hold for the temporal dimension. Hence, we start with the one-sample testing of (2), which is easier to comprehend due to its simple structure and notation. In the following subsections, we present the one-sample testing of independence and defer the discussion of the two-sample correlation matrix equality testing to Section 3. We omit the superscript that denotes the group membership.

Suppose there are

independent and identically distributed (i.i.d.) centered random matrix-valued observations

, each with dimension , from a matrix normal distribution , where is a matrix of entries of 0, and the matrix and matrix are the covariance matrices associated with the rows and columns respectively. The vectorization is a vector of length following a multivariate normal distribution with mean zero and a covariance matrix of the form . Denote and . Without loss of generality (WLOG), we derive the testing procedure below for testing independence relating to the matrix . Note that we can simply transpose the observation so that the roles of and are switched and the procedure to test can be used to test after the transpose.

Our goals are to test the null hypothesis globally

 H0: A\rm is diagonal versus H1: A\rm is not % diagonal, (3)

and to identify nonzero entries , both of which are invariant up to a constant. As such, even though and are not identifiable as and will lead to the same matrix normal distribution for any positive scalar , this has no effect on the global hypothesis testing procedure of Section 2.1 and the support recovery approach of Section 2.2. Throughout the paper, we use to denote constants whose values may change from line to line.

### 2.1 Global Testing Procedure

To test the property of

, it is natural to construct the test statistic based on an estimate of

. A naive estimate of is , which can also be rewritten as , where denotes the -th column of matrix . In the stock return example, is a length- vector, representing the return of industries of country during month . This naive estimate is the same as the sample covariance matrix for vector-valued observations if we treat , for and , as i.i.d. observations. Note that these observations are i.i.d. only when

is a multiple of an identity matrix, which implies no temporal correlation and is typically unrealistic. According to the definition of matrix normal distribution, the covariance matrix of any column is proportional to the matrix

, i.e., for all . It then follows that, for the naive estimate , there exists a constant such that , and hence

is an unbiased estimate of

. Similarly, there exists a constant such that is an unbiased estimate of , where

 ~B=1npn∑k=1X′kXk. (4)

However, the above naive estimation is not efficient and can be improved further as follows.

Consider , for . Because of the property of matrix normal distribution, we have , which implies that all of the columns of follow i.i.d. multivariate normal distribution with covariance . This is equivalent to observing i.i.d. random vectors with covariance . Right-multiplying matrix

can be roughly thought of as the pre-whitening of the matrix normal distribution where the column covariance becomes identity after the linear transformation. Therefore, when

is known, is the most efficient and oracle estimate of . Of course, is often unknown in practice, in which case, plugging in a legitimate estimate of is a natural approach and we choose as a candidate. This idea leads to the following estimate of ,

 (^ai,j)=:^A=1nqn∑k=1Xk(~B/c′)−1X′k, (5)

where is defined in (4). Note that when ,

defined above is invertible with probability one. We further comment that there are many appropriate choices for the estimation of

besides the simple sample estimator as long as it satisfies the equation (39) in our proof. This may lead us to use, for example, the banded estimator in Rothman et al. (2010), the adaptive thresholding estimator in Cai and Liu (2011), etc., if we have the prior information on the structure of .

To test whether is diagonal in (3), it is tempting to consider the magnitude of all the off-diagonal entries of in (5). However, the estimate in (5) cannot be used directly yet, because can have different levels of variability. Recall the simple one-sample

-test for the mean of i.i.d. random variables, the test statistic is based on the ratio of the estimate of the mean and its standard error. To treat all the off-diagonal entries

, in a fair manner, it is necessary to standardize first.

In order to standardize, we re-examine the construction of . Since (5) can be re-expressed as

 ^A=1nqn∑k=1q∑l=1(Xk(~B/c′)−1/2)⋅l(Xk(~B/c′)−1/2)′⋅l,

it has the oracle counterpart when is known:

 ^Aoracle=1nqn∑k=1q∑l=1(XkB−1/2)⋅l(XkB−1/2)′⋅l,

whose entries are

 ^aoi,j=1nqn∑k=1q∑l=1(XkB−1/2)i,l(XkB−1/2)′j,l.

Then, it is natural to define the relevant population variances as

 θi,j=Var((XkB−1/2)i,l(XkB−1/2)j,l)=Var((Zk)i,l(Zk)j,l), (6)

for all . Note that the definition of above does not depend on nor . Given the observations , the estimates of these variances can be obtained by

 ^θi,j=1nqn∑k=1q∑l=1[(Xk(~B/c′)−1/2)i,l(Xk(~B/c′)−1/2)j,l−^ai,j]2. (7)

The variance of can be estimated by . Similar spirit of the estimation of the variances has been used in Cai and Liu (2011) and Cai et al. (2013a), where the observations are vector-valued and do not need pre-whitening or the plugged-in estimate , while ours are matrix-valued and the estimation is more involved.

We now can define the standardized statistics

 Mi,j=^a2i,j^θi,j/(nq),\rm 1≤i

where and are defined in (5) and (7) respectively. The ’s are on the same scale and can be compared together. It is also seen that doesn’t depend on as the constant in the numerator and denominator of (8) is cancelled. WLOG, we set for the rest of the article.

Note that the null hypothesis is diagonal is equivalent to all of the off-diagonal entries of are zero, and hence further equivalent to the maximum of all the off-diagonal entries is zero, i.e., . Therefore, it is natural to construct the following test statistic,

 Mn=max1≤i

where is the standardized statistic for the -th entry in (8). Under the alternative hypothesis, there exists at least one non-zero off-diagonal entry , whose associated statistic is large, and the maximum test statistic will be large. Therefore, the null hypothesis should be rejected for large value of the test statistic .

To perform hypothesis test based on the test statistic , we further need to establish its null distribution. The exact theoretical property of its limiting behavior will be discussed in details in Section 4. For now, we can still obtain some intuition of the critical value. Roughly speaking, under the null hypothesis, each is approximately the square of a standard normal random variable due to standardization, and under certain conditions, the ’s are only weakly correlated with each other. So loosely speaking, the test statistic is the maximum of squared normals that are weakly dependent. Since the extreme value of the square of i.i.d. normal random variables is close to , is close to under . To be precise, theorems in Section 4 will show rigourously that under the null distribution and certain regularity assumptions, converges to a Gumbel distribution. Due to this limiting distribution, for any significance level , we can define the global test by

 Φα=I(Mn≥qα+4logp−loglogp), (10)

where is the indicator function. Here, the quantity

 qα=−log(8π)−2loglog(1−α)−1, (11)

is the quantile of the Gumbel distribution with the cumulative distribution function (cdf) . The null hypothesis is diagonal is rejected whenever .

We comment that since is the maximum of , the test is best suited for the case when the alternative hypothesis is sparse, that is, when only a small number of the off-diagonal entries of the covariance matrix are large. As long as one of the off-diagonal entries is large enough, the test will reject the null hypothesis. This test does not assume any other structure of the alternative hypothesis. In Section 4, we will show that this test is optimal against sparse alternatives. Note that, when the alternative is dense and many small off-diagonal entries exist, the proposed test is less capable of rejecting the null. Nevertheless, the large body of literature on portfolio construction typically assumes i.i.d excess returns and all serial correlations are zero (for a survey, see Brandt (2009)). In practice, the temporal correlations are more apparent in daily or even weekly returns due to non-synchronous trading or the bid-ask bounce effect, but much less so at monthly frequency so most of them may not be different from zero (Campbell et al., 1997).

It is also worth mentioning that the standardized statistics ’s are useful by themselves to recover the support of . In other words, we can identify the locations of the nonzero entries of by examining the values of , as we now discuss in the next section.

### 2.2 Support Recovery Procedure

We have focused on the test of the independence of the rows of by testing globally whether all of the off-diagonal entries of the row covariance matrix are zeros in Section 2.1. If the null hypothesis is rejected, it is of great value to locate the places where the covariances are not zero. Taking the stock return data for example, if the independence of the months is rejected (the matrix-valued observations need to be transposed before feeding into the testing procedure), one may want to identify which months are highly correlated, and if the independence of industries is rejected, it might be interesting to know which industries are correlated. Another example is brain imaging analysis, where the matrix-valued observations for patients are spatial-temporal data (Xia and Li, 2017, 2018, e.g.), and it is worthwhile investigating further how voxels of the brains are correlated after the rejection of independence of voxels. This is called the problem of support recovery.

This problem can be thought of as simultaneous testing of whether the off-diagonal entries of the covariance matrix are zero. Let the support of , neglecting the diagonal entries, be

 Ψ=Ψ(A)={(i,j):ai,j≠0,1≤i

Since there are off-diagonal covariances for support recovery, based on the extreme value theory, we can threshold the off-diagonal entries at the following level to obtain the estimate of the support,

 ^Ψ(τ)={(i,j):Mi,j≥τlogp,  1≤i

where the ’s are previously defined in (8), and is a threshold constant. Section 4 will show that when , the probability of exact recovery goes to 1 asymptotically if the nonzero entries are large enough. This is intuitive as is close to . Section 4 will further demonstrate that a smaller choice will fail to recover the support under certain conditions; therefore is optimal. We remark here that, we aim for the asymptotic exact recovery of the support in this and the following sections, while for other purposes, one may refer to alternative multiple testing approaches with family-wise error rate or false discovery rate control.

## 3 Two-Sample Testing of Correlation Matrix Equality

Having derived the procedure for the (one-sample) testing of independence, we can extend the approach to the two-sample scenario of testing the equality of two correlation matrices. Following the same notation as in the introduction, we have i.i.d. matrix-valued observations from matrix normal distribution for two groups . Considering the definition of the correlation matrices for the two groups in the introduction, we wish to test

 H∗0:R(1)A=R(2)A\rm versus H1:R(1)A≠R(2)A. (14)

Hereafter, we use the superscript to distinguish the quantities that are of relevance to the two-sample case from the one-sample case. To make inference about the correlation matrices, the estimates of these correlation matrices need to be constructed.

Given the observations and , as discussed for the one-sample case in Section 2

, we can first construct the estimates of the covariance matrices for the two groups and obtain the estimate of the correlation matrix for each group by dividing the covariance matrix with the corresponding standard deviation as follows,

 (^a(g)i,j)=:^A(g)=1ngqn∑k=1X(g)k(~B(g))−1(X% (g)k)′, (15) (^r(g)i,j)=:^R(g)A=⎛⎜⎝^a(g)i,j(^a(g)i,i^a(g)j,j)1/2⎞⎟⎠, (16)

where is the naive estimate of . Again, we cannot directly make inference based on

, because they are heteroscedastic. To make them homoscedastic, define the entry-wise population variance and the sample counterpart similarly as in (

6) and (7),

 θ(g)i,j=Var((X(g)k(B(g))−1/2)i,l(X(g)k(B(g))−1/2)j,l), ^θ(g)i,j=1ngqn∑k=1q∑l=1[(X(g)k(~B(g))−1/2)i,l(X(g)k(~B(g))−1/2)j,l−^a(g)i,j]2.

As such, the variance of can be estimated by , where

 ^ϑ(g)i,j=^θ(g)i,j^a(g)i,i^a(g)j,j.

Consequently, the variance of can be estimated by . Note that, for vector-valued observations, to test the equality of the correlations from two populations, Cai and Zhang (2016) estimated the variance by a careful investigation of the Taylor expansion in the calculation of correlation from covariance, and Cai and Liu (2016) introduced a variance stabilization method based on Fisher’s -transformation. Our approach is different from both methods.

When we focus on a single entry of the hypothesis in (14) such as , in accordance with the two-sample -test with unequal variances for i.i.d. random variables, it is natural to define the standardized statistic as

 M∗i,j=(^r(1)i,j−^r(2)i,j)2^ϑ(1)i,j/n1q+^ϑ(2)i,j/n2q, (17)

and the maximum test statistic as

 M∗n=max1≤i

Because the diagonal entries of the correlation matrix are all 1, the maximum is only taken over off-diagonal entries. The in the two-sample scenario has similar properties as the (9) in the one-sample scenario; see the comment in the paragraph after (11). It will be proven in Section 4 that also converges to a Gumbel distribution under and certain regularity assumptions. Therefore, for a given significance level , the test can be defined in parallel as (10),

 Φ∗α=I(M∗n≥qα+4logp−loglogp), (19)

where is still the quantile of the Gumbel distribution and its expression is in (11). The hypothesis is rejected whenever .

To find which industries have correlations that are significantly different between emerging countries and developed countries, we need to recover the support of the difference of the correlation matrices between the two groups of countries. Denote the support of by

 Ψ∗=Ψ∗(R(1)A,R(2)A)={(i,j):r(1)i,j≠r(2)i,j,1≤i

We threshold the entry-wise statistic in (17) at an appropriate level to obtain the estimated support as

 ^Ψ∗(τ)={(i,j):M∗i,j≥τlogp,  1≤i

where is again the threshold constant and the choice of is optimal as shown in Section 4.

## 4 Theoretical Properties

We present the theoretical properties of the procedures for the one-sample case in Section 4.1 and the two-sample case in Section 4.2.

The following conventions for notations are adopted. Throughout the article, for a length vector , denote its Euclidean norm by . For a size matrix , denote its Frobenius norm by and its spectral norm by . For a matrix , let and

be its largest and smallest eigenvalues respectively. Denote its matrix 1-norm as

. For two sequences of real numbers and , write (respectively ) if there exists a constant such that (respectively ) holds for all sufficiently large and write if .

### 4.1 Theoretical Properties for Testing of Independence

We will provide the theoretical justifications of the global testing procedure (10) and support recovery procedure (13). For the global testing procedure, its theoretical properties will be established from two perspectives: the size and the power. Specifically, to study the asymptotic size of the test, we prove the asymptotic distribution of the test statistic under the null hypothesis; to analyze the power, we consider the sparse alternatives where only a small subset of the entries are nonzero. For the support recovery procedure, we will show that recovers the support with probability tending to one under certain conditions.

Under Conditions (C1) and (C2), as mentioned in Section 2, Theorem 1 shows that indeed converges weakly to a Gumbel distribution under the null hypothesis.

• Assume that , , and there are some constant such that, , and .

• Assume that = .

Condition (C1) on the eigenvalues of the covariance matrices is commonly assumed in the high-dimensional setting. It implies that the majority of the variables are not highly correlated with the others in either the row direction or the column direction. Condition (C2) is mild and is assumed to ensure that defined in (4), as the estimation of the inverse of the nuisance covariance , is reasonably accurate. As such, the oracle estimate will be close to the estimate in (5) as will be shown in the proofs. In the special case when is bounded, (C2) essentially implies that the nuisance dimension can be of a polynomial order of .

###### Theorem 1.

Suppose that the regularity conditions (C1) and (C2) hold. Then under , for any ,

 P(Mn−4logp+loglogp≤t)→exp(−1√8πexp(−t2)), (21)

as . Furthermore, under , the convergence in (21) is uniform for all satisfying (C1)-(C2).

We next turn to the power analysis of the test . In order to perform the power analysis, we focus on sparse alternative hypothesis, as explained in Section 2.1, and define the following class of covariance matrices associated with the row direction of the matrix-valued observations:

 U(c)={A=(ai,j)p×p: max1≤i

where was defined previously in (6). Note that this class of covariance matrices only requires one element to be large enough, . As , it essentially requires only one off-diagonal entry of to be larger than . For such matrices with as the alternative hypothesis, Theorem 2 shows that can distinguish the alternative hypothesis from the null hypothesis, where the off-diagonal entries of are all zero, asymptotically. In other words, is rejected by with probability tending to 1 if .

###### Theorem 2.

Suppose that Conditions (C1) and (C2) hold. As , we have

 infA∈U(4)P(Φα=1)→1.

Theorem 3 further demonstrates that the lower bound of in the definition of the class of covariance matrices is rate optimal. Let be the set of level tests, i.e., we have under the null hypothesis for any test .

###### Theorem 3.

Suppose that . Let and . There exists some constant such that for all sufficiently large and ,

 infA∈U(c0)supTα∈TαP(Tα=1)≤1−β.

The above theorem implies that, when is small enough, with probability going to one, any level test cannot reject the null hypothesis uniformly over . As a consequence, the rate as the lower bound of cannot be improved.

To sum up, Theorems 1-3 suggest that the test defined in Section 2 has asymptotic level , it has power one asymptotically under certain sparse alternative hypothesis, and the rate requirement on the sparse alternative is the weakest possible one.

To study the theoretical property of the support recovery procedure in (13), recall the definition of the support of in (12) and define the following class of covariance matrices in parallel with (22):

 W(c)={A=(ai,j)p×p:min(i,j)∈Ψ|ai,j|√θi,j/(nq)≥c√logp}.

Note that requires the maximum of to be lower bounded by while requires the minimum of over the support is lower bounded by the same quantity. This requirement essentially means that all of the entries over the support are sufficiently large and thus can be distinguished from the noise. Then Theorem 4 below shows that the estimator with threshold constant recovers the support perfectly with probability going to when the magnitudes of all the nonzero off-diagonal entries are above certain thresholds as in .

###### Theorem 4.

Suppose that Conditions (C1) and (C2) hold. As , we have

 infA∈W(4)P(^Ψ(4)=Ψ)→1.
###### Remark 1.

With the same reasoning as in Cai et al. (2013a), it can be easily verified that the choice of the threshold constant is optimal. As a matter of fact, for any , the probability of exact recovery of the support goes to zero. The failure of exact recovery is because the small threshold of will estimate some of the zero entries by nonzero values, i.e., the estimated support will be larger than the true support. In addition, the rate of as the requirement of the nonzero entries of cannot be relaxed.

### 4.2 Theoretical Properties for Testing of Correlation Matrix Equality

For the two-sample testing of correlations, we assume the sample sizes from the two groups are comparable, , and write in this section.

The Conditions (C1)-(C2) in the one-sample case need to be replaced by the following conditions for the two-sample case.

• Assume that , , and there are some constant such that, , and , for .

• Assume that = , for .

• There exists some such that for any sufficiently small constant , where the set is defined as

 Aγ={(i,j):|r(g)i,j|≥(logp)−1−γ,1≤i

Note that, Conditions (C1) and (C2) are the two-sample analog of the one-sample conditions (C1) and (C2). Condition (C3) ensures that most of the variables are not highly correlated with each other.

Under appropriate regularity conditions, Theorems 5-8 are the two-sample counterparts of the one-sample Theorems 1-4. In particular, Theorem 5 shows the limiting distribution of (18) under the null hypothesis and proves that (19) has level asymptotically, Theorem 6 provides the power analysis of , Theorem 7 demonstrates the optimality of the test, and Theorem 8 states the exact support recovery property of .

###### Theorem 5.

Suppose that Conditions (C1)-(C3) hold. Then under (14), for any ,

 (23)

as . Furthermore, under , the convergence in (23) is uniform for all and satisfying (C1)-(C3).

To analyze the power of , in parallel with (22), define the following class of matrices:

where . We have the following result.

###### Theorem 6.

Suppose that Conditions (C1)-(C3) hold. As , we have

 inf(R(1)A,R(2)A)∈U∗(4)P(Φ∗α=1)→1.

Note that is able to distinguish the alternative from the null so long as one entry satisfies the requirement .

The above rate is optimal because of the next theorem. Let be the set of all -level tests, i.e., under for any .

###### Theorem 7.

Suppose that . Let and . There exists some constant such that for all large and ,

 inf(R(1)A,R(2)A)∈U∗(c0)supTα∈T∗αP(Tα=1)≤1−β.

Construct the set of matrices whose support has the rate defined above, namely,

 W∗(c)={(R(1)A,R(2)A):min(i,j)∈Ψ∗|r(1)i,j−r(2)i,j|√ϑ(1)i,j/(n1q)+ϑ(2)i,j/(n2q)≥c√logp}.

Theorem 8 claims that