Consider an noisy matrix modeled as
where is a low-rank deterministic matrix with fixed rank and is a real random noise matrix. We assume that
admits the singular value decomposition
where consists of the singular values of and we assume ; and are the matrices consisting of the -normalized left and right singular vectors. For the noise matrix in (1.1), we assume that the entries
’s are i.i.d real random variables with
For simplicity, we also assume the existence of all moments, i.e., for every integerthere is some constant such that
This condition can be weakened to the existence of some sufficiently high order moment. But we do not pursue this direction here. We remark here although we are primarily interested in the real case, our method also applies to the case when is a complex noise matrix.
In practice, is often called the signal matrix which contains the information of interest. In the high dimensional setup, when and are comparably large, we are primarily interested in the inference of or its left and right singular spaces, which are the subspaces spanned by ’s or ’s, respectively. Such a problem arises in many scientific applications such as sparse PCA [57, 63], matrix denoising [23, 24], multiple signal classification (MUSIC) [32, 61], synchronization [54, 55] and multidimensional scaling [27, 51]. We call the model in (1.1) the matrix denoising model, which is also often referred to as the signal-plus-noise model in the literature. We refer to subsection 1.2 for more introduction on the application aspects.
We denote the singular value decomposition of by
where , and . Here , and ’s and ’s are the -normalized sample singular vectors.
In this paper, we are interested in the distributions of the principal left and right singular vectors of
and the subspaces spanned by them. The natural estimators forand are
respectively, namely, the matrices consisting of the first left and right singular vectors of , respectively. To measure the distance between and (or and ), we consider the following matrix of the cosine principal angles between two subspaces (see [31, Section 6.4.3] for instance):
where ’s and ’s are the singular values of the matrices and , respectively. Therefore, an appropriate measure of the distance between the subspaces is for the left singular subspace or for the right singular subspace, where stands for the Frobenius norm. Note that and can also be written as
In this paper, we are interested in the following high-dimensional regime: for some small constant we have
Our main results are on the limiting distributions of individual (resp. ) and (resp. ) when the signal strength, ’s, are supercritical (c.f. Assumption 2.1). They are stated in Theorems 2.3, 2.8, 2.9 and 2.10, after necessary notations are introduced. In the rest of this section, we review some related literature from both theoretical and applied perspectives.
1.1. On finite rank deformation of random matrices
From the theoretical perspective, our model in (1.1
) is in the category of the fixed-rank deformation of the random matrix models in the Random Matrix Theory, which also includes the deformed Wigner matrix and the spiked sample covariance matrix as typical examples. There are a vast of work devoted to this topic and the primary interest is to investigate the limiting behavior of the extreme eigenvalues and the associated eigenvectors of the deformed models. Since the seminal work of Baik, Ben Arous and Péché, it is now well-understood that the extreme eigenvalues undergo a so-called BBP transition along with the change of the strength of the deformation. Roughly speaking, there is a critical value such that the extreme eigenvalue of the deformed matrix will stick to the right end point of the limiting spectral distribution of the undeformed random matrix if the strength of the deformation is less than or equal to the critical value, and will otherwise jump out of the support of the limiting spectral distribution. In the latter case, we call the extreme eigenvalue as an outlier, and the associated eigenvector as an outlier eigenvector. Moreover, the fluctuation of the extreme eigenvalues in different regimes (subcritical, critical and supercritical) are also identified in  for the complex spiked covariance matrix. We also refer to [5, 10, 11, 3, 19, 23, 36, 49] and the reference therein for the first-order limit of the extreme eigenvalue of various fixed-rank deformation models. The fluctuation of the extreme eigenvalues of various models have been considered in [2, 3, 7, 8, 9, 21, 22, 25, 14, 15, 36, 49, 30, 50, 53, 38]. Especially, the fluctuations of the outliers are shown to be non-universal for the deformed Wigner matrices, first in  under certain special assumptions on the structure of the deformation and the distribution of the matrix entries, and then in  in full generality.
The study on the behavior of the extreme eigenvectors has been mainly focused on the level of the first order limit [10, 11, 18, 23, 49]. In parallel to the results of the extreme eigenvalues, it is known that the eigenvectors are delocalized in the subcritical case and have a bias on the direction of the deformation in the supercritical case. It is recently observed in  that a deformation close to the critical regime will cause a bias even for the non-outlier eigenvectors. On the level of the fluctuation, the limiting behavior of the extreme eigenvectors has not been fully studied yet. By establishing a general universality result of the eigenvectors of the sample covariance matrix in the null case, the authors of  are able to show that the law of the eigenvectors of the spiked covariance matrices are asymptotically Gaussian in the subcritical regime. More specifically, the generalized components of the eigenvectors are distributed. In the supercritical regime, under some special assumptions on the structure of the deformation and the distribution of the random matrix entries, it is shown in  that the eigenvector distribution of a generalized deformed Wigner matrix model is non-universal in the supercritical regime. In the current work, we aim at establishing the non-unversality for the outlier singular vectors for the matrix denosing model under fully general assumptions on the structure of the deformation and the distribution of the random matrix . This can be regarded as an eigenvector counterpart of the result on the outlying eigenvalue distribution in .
1.2. On singular subspace inference
In the situation when is fixed, the sample eigenvectors of1]. When diverges with many interesting results have been proposed under various assumptions. One line of the work is to derive the perturbation bounds for the perturbed singular vectors based on Davis-Kahan’s theorem. For instance, in , the authors improve the perturbation bounds of Davis-Kahan theorem to be nearly optimal. In , the authors study similar problems and their related statistical applications. Most recently, in the papers [28, 29, 62], the authors derive the pertubation bounds assuming that the population vectors were delocalized (i.e. incoherent). The other line of the work is to study the asymptotic normality of the spectral projection under various regularity conditions. In such cases, the singular vectors of can be estimated using those of and some Gaussian approximation technique can be employed. Considering the Gaussian data samples and under the assumption that the order of is much smaller than in [39, 40, 41], the authors prove that the eigenvectors of
are asymptotically normally distributed, whose variance depends the eigenvectors of. Furthermore, in , assuming that such random matrices are available, the author shows that the singular vectors of can be estimated via trace regression using matrix nuclear norm penalized least squares estimation (NNPLS). Under the assumption that the author shows that the principal angles of the subspace estimated using NNPLS are asymptotically normal. In , for i.i.d sub-Gaussian samples with population covariance matrix the authors estimate the first loading factor of using a Lasso type de-biased estimator. Under the assumption that and is sparse, i.e the authors prove that the Lasso type de-biased estimator is asymptotically normal.
2. Main results and methodology
In this section, we state our main results, and briefly summarize our proof strategy.
For a positive integer , we denote by the set . Let be the complex upper-half plane. Further, we define the following linearization for our model
In the sequel, we will often omit and simply write and when there is no confusion.
We denote the empirical spectral distributions (ESD) of the matrices and by
and are known to satisfy the Marchenko-Pastur (MP) law . More precisely, almost surely, converges weakly to a non-random limit which has a density function given by
and has a point mass at the origin if , where and . Furthermore, the Stieltjes’s transform of is given by
where the square root denotes the complex square root with a branch cut on the negative real axis. Similarly, almost surely, converges weakly to a non-random limit which has a density function given by
and a point mass at the origin if . The corresponding Stieltjes’s transform is
In this paper, the singular values of are assumed to satisfy the supercritical condition.
Assumption 2.1 (Supercritical condition).
There exists a constant such that
The first inequality above ensures that the first singular values of are outliers. The second inequality guarantees that the outliers of are well separated from each other. Both conditions can be weakened. For instance, we do allow the existence of the subcritical and critical ’s if we only focus on the outlier singular vectors of . Also, the separation of ’s by an order distance is not necessary. In , a much weaker separation of order is enough for the discussion of the eigenvalues. But we do not pursue these directions in the current paper.
To state our results, we need more notations. First, we define
For each , we will write for short. In [23, Theorem 3.4], it has been shown that is the limit of . Further, we set
2.2. Main results
In this section, we state our main results.
For a vector and , we introduce the notation
For the right singular vectors, we have the following theorem.
Theorem 2.3 (Right singular vectors).
In , the authors obtain the non-universality for the limiting distributions of the outliers (outlying eigenvalues) of the deformed Wigner matrices. The limiting distributions admit similar forms as the limiting distributing for the outlier singular vectors for our models. One might notice that the third and the fourth cumulants of the entries of the Wigner matrices are allowed to be different in . An extension along this direction is also straightforward for our result.
We discuss a few special cases of interest. For simplicity, we assume that has rank and drop all the subindices.
If the entries of are standard Gaussian random variables (i.e. ), then and thus is asymptotically distributed as
If both and are delocalized in the sense that and , then and for
. We conclude from central limit theorem that
and therefore has asymptotically the same distribution as
The only difference from the Gaussian case is a shift caused by the non-vanishing third cumulant.
If one of and is delocalized, say , then still has the limiting distribution in (2.9). Therefore has asymptotically the same distribution as a Gaussian random variable with mean
For two vectors and , we denote
Recall from (1.5). We have the following theorem.
Theorem 2.8 (Right singular subspace).
Similarly, we set
For the left singular vectors, we have the following theorem.
Theorem 2.9 (Left singular vectors).
Next, we state the result on the asymptotic distribution of defined in (1.5).
2.3. Proof strategy
In this subsection, we briefly describe our proof strategy. We first review the method used in a related work , and then we highlight the novelty of our strategy.
As we previously mentioned, in , the authors derive the distribution of outliers (outlying eigenvalues) of the fixed-rank deformation of Wigner matrices. The main technical input is the isotropic local law for Wigner matrices, which provides a precise large deviation estimate for the quadratic form for any deterministic vectors . Here is a Wigner matrix. It turns out that an outlier of the deformed Wigner matrix can also be approximated by a quadratic form of the Green function, of the form . So one can turn to establish the law of the quadratic form of the Green function instead. In , the authors decompose the proof into three steps. First, the law is established for the GOE/GUE, the Gaussian Wigner matrix, for which orthogonal/unitary invariance of the matrix can be used to facilitate the proof. In the second step of going beyond Gaussian matrix, in order to capture the independence of the Gaussian part and the non-Gaussian part of the limiting distribution of the outliers, the authors construct an intermediate matrix in which most of the matrix entries are replaced by the Gaussian ones while those with coordinates corresponding to the large components of are kept as generally distributed. The intermediate matrix allows one to use the nice properties of the Gaussian ensembles such as orthogonal/unitary invariance for the major part of the matrix, and meanwhile keeps the non-Gaussianity induced by the small amount of generally distributed entries. In the last step, the authors of  derive the law for the fully generally distributed Wigner matrix by further conducting a Green function comparison with the intermediate matrix.
For our problem, similarly, we will use the isotropic law of the sample covariance matrix in [12, 37] as a main technical input. It turns out that for the singular vectors, we can approximately represent (after appropriate centralization) in terms of a quantity of the form
where is the Green function of the linearization of the sample covariance matrix and is the deterministic approximation of ; see (3.1) and (3.6) for the definitions. Here both and are deterministic fixed-rank matrices. Hence, differently from the outlying eigenvalues or singular values, the Green function representation of the singular vectors also contains the derivative of the Green function. More importantly, instead of the three step strategy in , here we derive the law of the above directly for generally distributed matrix. Recall defined in (2.8), whose random part is proportional to , which is simply a linear combination of the entries of . Inspired by , we decompose into two parts, say and . The former contains the linear combination of ’s for those indices corresponding to the large components and in and . The latter contains the linear combinations of the rest of ’s. Note that is asymptotically normal by CLT since the coefficients of ’s are small. However, may not be normal. The key idea of our strategy is to show the following recursive estimate: For any fixed , we have
Then asymptotic independence between and follows. Hence, we prove both the asymptotic normality and asymptotic independence from (2.11), and thus we kill two birds with one stone. The method of using the recursive estimate to get the large deviation bounds for Green function or some functional of the Green functions has been previously used in the context of the Random Matrix Theory. For instance, we refer to . However, as far as we know, it is the first time to use the recursive estimate to show the normality and the independence simultaneously for the functionals of the Green functions.
Finally, we remark that the approach in this paper can also be applied to derive the distribution of the outlier eigenvectors of the spiked sample covariance matrix and the deformed Wigner matrix. We will consider these extensions in the future work (c.f. ).
The rest of the paper is organized as follows. In Section 3, we introduce some main technical results including the isotropic local law and also derive the Green function representation for our statistics. In Section 4, we prove Theorems 2.3 and 2.9, based on the recursive estimate in Proposition 4.2. Section 5 is then devoted to the proof of Proposition 4.2. In Section 6, we state the proof for a main technical lemma, Lemma 5.2, which is used in the proof of Proposition 4.2. In Section 7, we prove Theorems 2.8 and 2.10.
3. Techincal tools and Green function representations
This section is devoted to providing some basic notions and technical tools, which will be needed often in our proofs for the theorems. The basic notions are given in Section 3.1. A main technical input for our proof is the isotropic local law for the sample covariance matrix obtained in [12, 37]. It will be stated in Section 3.2. In subsection 3.3, we represent (asymptotically) ’s and ’s, and also and (c.f. (1.5)), in terms of the Green function. The discussion is based on the second author’s previous work , where the limits for and are studied. We then collect a few auxiliary lemmas in Section 3.4.
3.1. Basic notions
Here we recall the definition in (2.2). By Schur complement, it is easy to derive
The Stieltjes transforms for the ESD of and are defined by
It is well-known that and have nonrandom approximates and , which are the Stieltjes transforms for the MP laws defined in (2.3) and (2.4). Specifically, for any fixed , the following hold almost surely,
Furthermore, one can easily check that and satisfy the following self-consistent equations
We can also derive the following simple relation from the definitions
Denote in (2.5). For any we have
Furthermore, denote by . We have
In the sequel, we also need the following notion on high probability events.
Definition 3.2 (High probability event).
We say that an -dependent event holds with high probability if, for any large
for sufficiently large
We also adopt the notion of stochastic domination introduced in . It provides a convenient way of making precise statements of the form “ is bounded by up to small powers of with high probability”.
Definition 3.3 (Stochastic domination).
be two families of nonnegative random variables, where is a possibly -dependent parameter set. We say that is stochastically dominated by uniformly in if for all small and large we have
for large enough In addition, we use the notation if is stochastically dominated by uniformly in Throughout this paper, the stochastic domination will always be uniform in all parameters (mostly are matrix indices and the spectral parameter ) that are not explicitly fixed.
3.2. Isotropic local laws
We will need the isotropic local law outside the spectrum of the MP law. For define the spectral domain
where is a fixed small constant. Recall the notations and defined in (3.2).