Asymptotic Theory of Eigenvectors for Large Random Matrices

02/19/2019 ∙ by Jianqing Fan, et al. ∙ University of Southern California Princeton University Nanyang Technological University 0

Characterizing the exact asymptotic distributions of high-dimensional eigenvectors for large structured random matrices poses important challenges yet can provide useful insights into a range of applications. To this end, in this paper we introduce a general framework of asymptotic theory of eigenvectors (ATE) for large structured symmetric random matrices with heterogeneous variances, and establish the asymptotic properties of the spiked eigenvectors and eigenvalues for the scenario of the generalized Wigner matrix noise, where the mean matrix is assumed to have the low-rank structure. Under some mild regularity conditions, we provide the asymptotic expansions for the spiked eigenvalues and show that they are asymptotically normal after some normalization. For the spiked eigenvectors, we establish novel asymptotic expansions for the general linear combination and further show that it is asymptotically normal after some normalization, where the weight vector can be arbitrary. We also provide a more general asymptotic theory for the spiked eigenvectors using the bilinear form. Simulation studies verify the validity of our new theoretical results. Our family of models encompasses many popularly used ones such as the stochastic block models with or without overlapping communities for network analysis and the topic models for text analysis, and our general theory can be exploited for statistical inference in these large-scale applications.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The big data era has brought us a tremendous amount of both structured and unstructured data including networks and texts in many modern applications. For network and text data, we are often interested in learning the cluster and other structural information for the underlying network communities and text topics. In these large-scale applications, we are given a network data matrix or can create such a matrix by calculating some similarity measure between text documents, where each entry of the data matrix is binary indicating the absence or presence of a link, or continuous indicating the strength of similarity between each pair of nodes or documents. Such applications naturally give rise to random matrices that can be used to reveal interesting latent structures of networks and texts for effective predictions and recommendations.

Random matrix has been widely exploited to model the interactions among the nodes of a network for applications ranging from physics and social sciences to genomics and neuroscience. Random matrix theory (RMT) has a long history and was originated by Wigner Wigner (1955) for modeling the nucleon-nucleus interactions to understand the behavior of atomic nuclei and link the spacings of the levels of atomic nuclei to those of the eigenvalues of a random matrix. In particular, Wigner (1955) and Wigner (1958) identified the limiting spectral distribution, known as Wigner’s semicircle law, for the eigenvalues of high-dimensional Wigner matrix as the dimensionality diverges. The Wigner matrix generally refers to a symmetric random matrix whose upper diagonal entries are independent with zero mean and identical variance, and whose diagonal entries are independent and identically distributed (i.i.d.) with zero mean. See, for example, Bai (1999)

for a review of some classical technical tools such as the moment method and Stieltjes transform as well as some more recent developments on the RMT, and

Mehta (2004); Tao (2004); Bai and Silverstein (2006) for detailed book-length accounts of the topic of random matrices.

Most of existing work on the RMT has focused mainly on the spectral theory for the limiting distributions of the eigenvalues of random matrices. For instance, the limiting spectral distribution of the Wigner matrix was generalized by Arnold (1967) and Arnold (1971). Marchenko and Pastur (1967) established the well-known Marchenko–Pastur law for the limiting spectral distribution of the sample covariance matrix including the Wishart matrix which plays an important role in statistical applications. The spectral distribution refers to the empirical distribution of all the eigenvalues of a random matrix. In contrast, the asymptotic distribution of the largest nonspiked eigenvalue of Wigner matrix with Gaussian ensemble was revealed to be the Tracy–Widom law in Tracy and Widom (1994) and Tracy and Widom (1996). More recent developments on the asymptotic distribution of the largest nonspiked eigenvalue include Johnstone (2001), El Karoui (2007), Johnstone (2008), Erdös et al. (2011), and Knowles and Yin (2017). See also Füredi and Komlós (1981), Baik et al. (2005), Bai and Yao (2008), Knowles and Yin (2013), Pizzo et al. (2013), Renfrew and Soshnikov (2013), Knowles and Yin (2014), and Wang and Fan (2017) for the asymptotic distributions of the spiked eigenvalues of various random matrices and sample covariance matrices. To ensure consistency, Johnstone and Lu (2009)

proposed the sparse principal component analysis to reduce the noise accumulation in high-dimensional random matrices. There is also a growing literature on the specific scenario and applications of large network matrices. See, for example,

McSherry (2001), Spielman and Teng (2007), Bickel and Chen (2009), Decelle et al. (2011), Rohe et al. (2011), Lei (2016), Abbe (2017), Jin et al. (2017), Chen and Lei (2018), and Vu (2018).

Matrix perturbation theory has been commonly used to characterize the deviations of empirical eigenvectors from the population ones, often under the average errors (Horn and Johnson, 2012). In contrast, recently Abbe et al. (2017)

investigated random matrices with low expected rank and provided a tight bound for the difference between the empirical eigenvector and some linear transformation of the population eigenvector through a delicate entrywise eigenvector analysis for the first-order approximation under the maximum norm. See also

Paul (2007), Koltchinskii and Lounici (2016), Koltchinskii and Xia (2016), and Wang and Fan (2017)

for the asymptotics of empirical eigenstructure for large random matrices. Yet despite these endeavors, the precise asymptotic distributions of the eigenvectors for high-dimensional random matrices still remain largely unknown even for the case of Wigner matrix noise. Indeed characterizing the exact asymptotic distributions of high-dimensional eigenvectors for large structured random matrices can provide useful insights into a range of applications that involve the eigenspaces. To this end, in this paper we attempt to provide some general theoretical underpinning on such a perspective for large random matrices.

The major contribution of this paper is introducing a general framework of asymptotic theory of eigenvectors (ATE) for large structured random matrices with the mean matrix of the low-rank structure and the noise matrix being the generalized Wigner matrix. The generalized Wigner matrix refers to a symmetric random matrix whose diagonal and upper diagonal entries are independent with zero mean, allowing for heterogeneous variances. Our family of models includes a variety of popularly used ones such as the stochastic block models with or without overlapping communities for network analysis and the topic models for text analysis. Our technical tool is general and distinct from existing techniques in the RMT literature; see Section 3.5 for detailed discussions. Specifically, under some mild regularity conditions we establish the asymptotic expansions for the spiked eigenvalues and prove that they are asymptotically normal after some normalization. For the spiked eigenvectors, we provide novel asymptotic expansions for the general linear combination and further establish that it is asymptotically normal after some normalization for arbitrary weight vector. We also present a more general asymptotic theory for the spiked eigenvectors based on the bilinear form. To the best of our knowledge, these theoretical results are new to the literature. Our general theory can be exploited for statistical inference in a range of large-scale applications including network analysis and text analysis. For detailed comparisons with the literature, see Section 3.6.

The rest of the paper is organized as follows. Section 2 presents the model setting and theoretical setup for ATE. We establish the asymptotic expansions and asymptotic distributions for the spiked eigenvectors as well as the asymptotic distributions for the spiked eigenvalues in Section 3. Section 4 presents some numerical examples to demonstrate our theoretical results. We further provide a more general asymptotic theory extending the results from Section 3 using the bilinear form in Section 5. Section 6 discusses some implications and extensions of our work. The proofs of main results are relegated to the Appendix. Additional technical details are provided in the Supplementary Material.

2 Model setting and theoretical setup

2.1 Model setting

As mentioned in the introduction, we focus on the class of large structured symmetric random matrices with low-rank mean matrices and generalized Wigner matrices of noises. It is worth mentioning that our definition of the generalized Wigner matrix specified in Section 1 is broader than the conventional one in the classical RMT literature; see, for example, Yau (2012) for the formal mathematical definition with additional assumptions. To simplify the technical presentation, consider an symmetric random matrix with the following structure

(1)

where is a deterministic latent mean matrix of low rank structure, is an orthonormal matrix of population eigenvectors ’s with and , is a diagonal matrix of population eigenvalues ’s with , and is a symmetric random matrix of independent noises on and above the diagonal with zero mean , variances , and . The rank of the mean part is assumed typically to be a smaller order of the random matrix size , which is referred to as matrix dimensionality hereafter for convenience. The bounded assumption on the entries of the generalized Wigner matrix in the noise part is made frequently for technical simplification and satisfied in many real applications such as network analysis and text analysis. See, for instance, the stochastic block models with or without overlapping communities and the topic models that are popularly used in those applications.

In practice, it is either matrix X or matrix that is readily available to us, where denotes the diagonal part of a matrix. In the context of graphs, random matrix X characterizes the connectivity structure of a graph with self loops, while random matrix corresponds to a graph without self loops. In the latter case, the observed data matrix can be decomposed as

(2)

Observe that has the similar structure as in the sense of being symmetric and having bounded independent entries on and above the diagonal, by assuming that has bounded entries for such a case. Thus models (1) and (2) share the same decomposition of a deterministic low rank matrix plus some symmetric noise matrix of bounded entries, which is roughly all we need for the theoretical framework and technical analysis. For these reasons, to simplify the technical presentation we abuse slightly the notation by using X and to represent the observed data matrix and the latent noise matrix, respectively, in either model (1) or model (2). Therefore, throughout the paper the data matrix X may have diagonal entries all equal to zero and correspondingly the noise matrix may have a nonzero diagonal mean matrix, and our theory covers both cases.

In either of the two scenarios discussed above, we are interested in inferring the structural information in models (1) and (2), which often boils down to the latent eigenstructure . Since both the eigenvector matrix V and eigenvalue matrix D are unavailable to us, we resort to the observable random data matrix X for extracting the structural information. To this end, we conduct a spectral decomposition of X, and denote by its eigenvalues and the corresponding eigenvectors. Without loss of generality, assume that and denote by an matrix of spiked eigenvectors. As mentioned before, we aim at investigating the precise asymptotic behavior of the spiked empirical eigenvalues and spiked empirical eigenvectors of data matrix X. It is worth mentioning that our definition of spikedness differs from the conventional one in that the underlying rank order depends on the magnitude of eigenvalues instead of the nonnegative eigenvalues that are usually assumed.

One concrete example is the stochastic block model (SBM), where the latent mean matrix H takes the form with a matrix of community membership vectors and a nonsingular matrix with for . Here, for each , with , , a unit vector with the th component being one and all other components being zero. It indicates which membership the th subject belongs to. It is well known that the community information of the SBM is encoded completely in the eigenstructure of the mean matrix H, which serves as one of our motivations for investigating the precise asymptotic distributions of the empirical eigenvectors and eigenvalues.

2.2 Theoretical setup

We first introduce some notation that will be used throughout the paper. We use to represent as matrix size increases. We say that an event

holds with significant probability if

for some positive constant and sufficiently large . For a matrix A, we use to denote the th largest eigenvalue in magnitude, and , , and to denote the Frobenius norm, the spectral norm, and the matrix entrywise maximum norm, respectively. Denote by the submatrix of A formed by removing the th column. For any -dimensional unit vector , let represent the maximum norm of the vector.

We next introduce a definition that plays a key role in proving all asymptotic normality results in this paper.

Definition 1.

A pair of unit vectors of appropriate dimensions is said to satisfy the -CLT condition for some positive integer if

is asymptotically standard normal after some normalization, where CLT refers to the central limit theorem.

Lemmas 1 and 2 below provide some sufficient conditions under which can satisfy the -CLT condition defined in Definition 1 for and , which is all we need for our technical analysis of asymptotic distributions.

Lemma 1.

Assume that -dimensional unit vectors x and y satisfy

(3)

Then satisfies the Lyapunov condition for CLT and we have as , which entails that satisfies the -CLT condition with .

For any given unit vectors and , denote by and

the mean and variance of the random variable

(4)

respectively, where and for , , with , , and when and 0 otherwise. It is worth mentioning that the random variable given in (2.2) coincides with the one defined in (B.2) in Section B.2 of Supplementary Material, which is simply the conditional variance of random variable given in (B.2) when expressed as a sum of martingale differences with respect to a suitably defined -algebra; see Section B.2 for more technical details and the precise expressions for and given in (B.2) and (B.2), respectively.

Lemma 2.

Assume that -dimensional unit vectors x and y satisfy , , and . Then we have as , which entails that satisfies the -CLT condition with .

We see from Lemmas 1 and 2 that the -CLT condition defined in Definition 1 can indeed be satisfied under some mild regularity conditions. In particular, Definition 1 is important to our technical analysis since to establish the asymptotic normality of the spiked eigenvectors and spiked eigenvalues, we first need to expand the target to the form of with some positive integer plus some small order term, and then the asymptotic normality follows naturally if satisfies the -CLT condition. To facilitate our technical presentation, let us introduce some further notation. For any and given matrices and of appropriate dimensions, we define the function

(5)

where is some sufficiently large positive integer that will be specified later in our technical analysis. For each , any given matrices and of appropriate dimensions, and -dimensional vector u, we further define functions

(6)
(7)

where denotes the submatrix of the diagonal matrix D by removing the th row and th column,

(8)

denotes the derivative with respect to scalar or complex variable throughout the paper, and the rest of notation is the same as introduced before.

3 Asymptotic distributions of spiked eigenvectors

3.1 Technical conditions

To facilitate our technical analysis, we need some basic regularity conditions.

Condition 1.

Assume that as .

Condition 2.

There exist some positive constant and small positive constant such that and .

Condition 3.

It holds that , , , , and , where .

Conditions 12 are needed in all our Theorems 15 and imposed for our general model (1), including the specific case of sparse models. In contrast, condition 3 is required only for Theorem 3 under some specific models with dense structures such as the stochastic block models with or without overlapping communities.

Condition 1 restricts essentially the sparsity level of the random matrix (e.g., given by a network). Note that it follows easily from that . It is a rather mild condition that can be satisfied by very sparse networks. For example, if and the other ’s are equal to zero, then we have . Many network models in the literature satisfy this condition; see, for example, Jin et al. (2017), Lei (2016), and Zhang et al. (2015).

Condition 2 requires that the spiked population eigenvalues of the mean matrix H (in the diagonal matrix D) are simple and there is enough gap between the eigenvalues. The constant can be replaced by some term and our theoretical results can still be proved with more delicate derivations. This requirement ensures that we can obtain higher order expansions of the general linear combination for each empirical eigenvector precisely. Otherwise if there exist some eigenvalues such that , then and are generally no longer identifiable so we cannot derive clear asymptotic expansions for them; see also Abbe et al. (2017) for related discussions. Condition 2 also requires a gap between and . Since parameter reflects the strength of the noise matrix , it requires essentially the signal part H to dominate the noise part with some asymptotic rate. Similar condition is used commonly in the network literature; see, for instance, Abbe et al. (2017) and Jin et al. (2017).

Condition 3 restricts our attention to some specific dense network models. In particular, assumes that the eigenvalues in D share the same order. The other assumptions in Condition 3 require essentially that the minimum variance of the off-diagonal entries of cannot tend to zero too fast, which is used only to establish a more simplified theory under the more restrictive model; see Theorem 3.

3.2 Asymptotic distributions of spiked eigenvalues

We first present the asymptotic expansions and CLT for the spiked empirical eigenvalues . For each , denote by the solution to equation

(9)

when restricted to the interval , where

The following lemma characterizes the properties of the population quantities ’s defined in (3.2).

Lemma 3.

Equation (3.2) has a unique solution in the interval and thus ’s are well defined. Moreover, for each we have as .

It is seen from Lemma 3 that when the matrix size is large enough, the values of and are very close to each other. The following theorem establishes the asymptotic expansions and CLT for and reveals that is in fact its asymptotic mean.

Theorem 1.

Under Conditions 12, for each we have

(10)

Moreover, if and the pair of vectors satisfies the -CLT condition, then we have

(11)

Capitaine et al. (2012) and Knowles and Yin (2014)

established the joint distribution of the spiked eigenvalues for the deformed Wigner matrix. In particular,

Capitaine et al. (2012) assumed that and for , while Knowles and Yin (2014) assumed that for all . Both of these two papers require the existence of self loops and focus on the scenario of . In contrast, our Theorem 1 is applicable to both cases of with or without self loops and allows for more general heterogeneity in the variances of entries of the noise matrix W. Therefore, our results in Theorem 1 are more general than those in Capitaine et al. (2012) and Knowles and Yin (2014). Furthermore, when we restrict ourselves to the model settings in Capitaine et al. (2012) and Knowles and Yin (2014), our Theorem 1 requires that , which is still different from the settings investigated in Capitaine et al. (2012) and Knowles and Yin (2014).

Theorem 1 requires that satisfies the -CLT condition and . To gain some insights into these two conditions, we will provide some sufficient conditions for such assumptions. Let us consider the specific case of , that is, the generalized Wigner matrix W is nonsparse. We will show that as long as

(12)

the aforementioned two conditions in Theorem 1 hold. We first verify the -CLT condition. By Lemma 1, a sufficient condition for to satisfy the -CLT condition is that

(13)

Observe that it follows from and that

(14)

where stands for the th component of vector . The assumption in (12) together with (14) ensures (13), which consequently entails that satisfies the -CLT condition.

We next check the condition . It follows directly from (14) that this condition holds under (12). In fact, since Condition 2 guarantees that , the assumption can be very mild. In particular, for the Wigner matrix W with for all , it holds that

(15)

Thus the condition of reduces to that of , which is guaranteed to hold under Condition 2.

We also would like to point out that one potential application of the new results in Theorem 1 is determining the number of spiked eigenvalues, which in the network models reduce to determining the number of non-overlapping (or possibly overlapping) communities or clusters.

3.3 Asymptotic distributions of spiked eigenvectors

We now present the asymptotic distributions of the spiked empirical eigenvectors for . To this end, we will first establish the asymptotic expansions and CLT for the bilinear form

with , where are two arbitrary non-random unit vectors. Then by setting , we can establish the asymptotic expansions and CLT for the general linear combination . Although the limiting distribution of the bilinear form is the theoretical foundation for establishing the limiting distribution of the general linear combination , due to the technical complexities we will defer the theorems summarizing the limiting distribution of to a later technical section (i.e., Section 5), and present only the results for in this section. This should not harm the flow of the paper. For readers who are also interested in our proofs, they can refer to Section 5 for more technical details; otherwise it is safe to skip that technical section. For each , let us choose the direction of such that for the theoretical derivations, which is always possible after a sign change.

Theorem 2.

Under Conditions 12, for each we have the following properties:

1) If the unit vector u satisfies that and , then it holds that

(16)

where the asymptotic mean has the expansion . Furthermore, if satisfies the -CLT condition, then it holds that

2) If , then it holds that

(17)

where the asymptotic mean has the expansion . Furthermore, if satisfies the -CLT condition, then it holds that

The two parts of Theorem 2 correspond to two different cases when can be of different magnitude. To understand this, note that for large enough matrix size , we have by Condition 2 and Lemma 3. In view of (17), the asymptotic variance of is equal to In contrast, in light of (2), the asymptotic variance of with is equal to Let us consider a specific case when . By Lemma 4 in Section 5, we have

This shows that the above two cases can be very different in the magnitude for the asymptotic variance of and thus should be analyzed separately.

To gain some insights into why has smaller variance, let us consider the simple case of . Then in view of our technical arguments, it holds that

(18)

where associated with the complex integrals represents the imaginary unit and the line integrals are taken over the contour that is centered at with radius . Then we can see that the population eigenvalue is enclosed by the contour . By the Taylor expansion, we can show that with significant probability,

Substituting the above expansion into (18) results in

(19)<