Singular vector and singular subspace distribution for the matrix denoising model

In this paper, we study the matrix denosing model Y=S+X, where S is a low-rank deterministic signal matrix and X is a random noise matrix, and both are M× n. In the scenario that M and n are comparably large and the signals are supercritical, we study the fluctuation of the outlier singular vectors of Y. More specifically, we derive the limiting distribution of angles between the principal singular vectors of Y and their deterministic counterparts, the singular vectors of S. Further, we also derive the distribution of the distance between the subspace spanned by the principal singular vectors of Y and that spanned by the singular vectors of S. It turns out that the limiting distributions depend on the structure of the singular vectors of S and the distribution of X, and thus they are non-universal.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

12/19/2019

Singular matrix variate Birnbaum-Saunders distribution under elliptical models

This work sets the matrix variate Birnbaum-Saunders theory in the contex...
11/12/2020

On a question of Haemers regarding vectors in the nullspace of Seidel matrices

In 2011, Haemers asked the following question: If S is the Seidel matrix...
04/30/2021

Spiked Singular Values and Vectors under Extreme Aspect Ratios

The behavior of the leading singular values and vectors of noisy low-ran...
03/24/2021

On the ℓ^∞-norms of the Singular Vectors of Arbitrary Powers of a Difference Matrix with Applications to Sigma-Delta Quantization

Let A _max := max_i,j |A_i,j| denote the maximum magnitude of entries of...
09/14/2021

Symbolic determinant identity testing and non-commutative ranks of matrix Lie algebras

One approach to make progress on the symbolic determinant identity testi...
04/03/2014

Subspace Learning from Extremely Compressed Measurements

We consider learning the principal subspace of a large set of vectors fr...
08/17/2019

On the Adversarial Robustness of Subspace Learning

In this paper, we study the adversarial robustness of subspace learning ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Consider an noisy matrix modeled as

(1.1)

where is a low-rank deterministic matrix with fixed rank and is a real random noise matrix. We assume that

admits the singular value decomposition

where consists of the singular values of and we assume ; and are the matrices consisting of the -normalized left and right singular vectors. For the noise matrix in (1.1), we assume that the entries

’s are i.i.d real random variables with

(1.2)

For simplicity, we also assume the existence of all moments, i.e., for every integer

there is some constant such that

(1.3)

This condition can be weakened to the existence of some sufficiently high order moment. But we do not pursue this direction here. We remark here although we are primarily interested in the real case, our method also applies to the case when is a complex noise matrix.

In practice, is often called the signal matrix which contains the information of interest. In the high dimensional setup, when and are comparably large, we are primarily interested in the inference of or its left and right singular spaces, which are the subspaces spanned by ’s or ’s, respectively. Such a problem arises in many scientific applications such as sparse PCA [57, 63], matrix denoising [23, 24], multiple signal classification (MUSIC) [32, 61], synchronization [54, 55] and multidimensional scaling [27, 51]. We call the model in (1.1) the matrix denoising model, which is also often referred to as the signal-plus-noise model in the literature. We refer to subsection 1.2 for more introduction on the application aspects.

We denote the singular value decomposition of by

(1.4)

where , and . Here , and ’s and ’s are the -normalized sample singular vectors.

In this paper, we are interested in the distributions of the principal left and right singular vectors of

and the subspaces spanned by them. The natural estimators for

and are

respectively, namely, the matrices consisting of the first left and right singular vectors of , respectively. To measure the distance between and (or and ), we consider the following matrix of the cosine principal angles between two subspaces (see [31, Section 6.4.3] for instance):

where ’s and ’s are the singular values of the matrices and , respectively. Therefore, an appropriate measure of the distance between the subspaces is for the left singular subspace or for the right singular subspace, where stands for the Frobenius norm. Note that and can also be written as

(1.5)

In this paper, we are interested in the following high-dimensional regime: for some small constant we have

(1.6)

Our main results are on the limiting distributions of individual (resp. ) and (resp. ) when the signal strength, ’s, are supercritical (c.f. Assumption 2.1). They are stated in Theorems 2.3, 2.8, 2.9 and 2.10, after necessary notations are introduced. In the rest of this section, we review some related literature from both theoretical and applied perspectives.

1.1. On finite rank deformation of random matrices

From the theoretical perspective, our model in (1.1

) is in the category of the fixed-rank deformation of the random matrix models in the Random Matrix Theory, which also includes the deformed Wigner matrix and the spiked sample covariance matrix as typical examples. There are a vast of work devoted to this topic and the primary interest is to investigate the limiting behavior of the extreme eigenvalues and the associated eigenvectors of the deformed models. Since the seminal work of Baik, Ben Arous and Péché

[4], it is now well-understood that the extreme eigenvalues undergo a so-called BBP transition along with the change of the strength of the deformation. Roughly speaking, there is a critical value such that the extreme eigenvalue of the deformed matrix will stick to the right end point of the limiting spectral distribution of the undeformed random matrix if the strength of the deformation is less than or equal to the critical value, and will otherwise jump out of the support of the limiting spectral distribution. In the latter case, we call the extreme eigenvalue as an outlier, and the associated eigenvector as an outlier eigenvector. Moreover, the fluctuation of the extreme eigenvalues in different regimes (subcritical, critical and supercritical) are also identified in [4] for the complex spiked covariance matrix. We also refer to [5, 10, 11, 3, 19, 23, 36, 49] and the reference therein for the first-order limit of the extreme eigenvalue of various fixed-rank deformation models. The fluctuation of the extreme eigenvalues of various models have been considered in [2, 3, 7, 8, 9, 21, 22, 25, 14, 15, 36, 49, 30, 50, 53, 38]. Especially, the fluctuations of the outliers are shown to be non-universal for the deformed Wigner matrices, first in [21] under certain special assumptions on the structure of the deformation and the distribution of the matrix entries, and then in [36] in full generality.

The study on the behavior of the extreme eigenvectors has been mainly focused on the level of the first order limit [10, 11, 18, 23, 49]. In parallel to the results of the extreme eigenvalues, it is known that the eigenvectors are delocalized in the subcritical case and have a bias on the direction of the deformation in the supercritical case. It is recently observed in [13] that a deformation close to the critical regime will cause a bias even for the non-outlier eigenvectors. On the level of the fluctuation, the limiting behavior of the extreme eigenvectors has not been fully studied yet. By establishing a general universality result of the eigenvectors of the sample covariance matrix in the null case, the authors of [13] are able to show that the law of the eigenvectors of the spiked covariance matrices are asymptotically Gaussian in the subcritical regime. More specifically, the generalized components of the eigenvectors are distributed. In the supercritical regime, under some special assumptions on the structure of the deformation and the distribution of the random matrix entries, it is shown in [20] that the eigenvector distribution of a generalized deformed Wigner matrix model is non-universal in the supercritical regime. In the current work, we aim at establishing the non-unversality for the outlier singular vectors for the matrix denosing model under fully general assumptions on the structure of the deformation and the distribution of the random matrix . This can be regarded as an eigenvector counterpart of the result on the outlying eigenvalue distribution in [36].

1.2. On singular subspace inference

From the applied perspective, our model (1.1) appears prominently in the study of signal processing [34, 47]

, machine learning

[58, 60] and statistics [16, 17, 24, 29]. For instance, in the study of image denoising, is treated as the true image [44] and in the problem of classification, contains the the underlying true mean vectors of samples [16]. In both situations, we need to understand the asymptotics of the singular vectors and subspace of , given the observation In addition, the statistics and defined in (1.5) can be used for the inference of the structure of the singular subspace of In the high dimensional regime (1.6), to the best of our knowledge, the distributions of and have not been studied yet in the literature.

In the situation when is fixed, the sample eigenvectors of

are normally distributed

[1]. When diverges with many interesting results have been proposed under various assumptions. One line of the work is to derive the perturbation bounds for the perturbed singular vectors based on Davis-Kahan’s theorem. For instance, in [48], the authors improve the perturbation bounds of Davis-Kahan theorem to be nearly optimal. In [16], the authors study similar problems and their related statistical applications. Most recently, in the papers [28, 29, 62], the authors derive the pertubation bounds assuming that the population vectors were delocalized (i.e. incoherent). The other line of the work is to study the asymptotic normality of the spectral projection under various regularity conditions. In such cases, the singular vectors of can be estimated using those of and some Gaussian approximation technique can be employed. Considering the Gaussian data samples and under the assumption that the order of is much smaller than in [39, 40, 41], the authors prove that the eigenvectors of

are asymptotically normally distributed, whose variance depends the eigenvectors of

. Furthermore, in [59], assuming that such random matrices are available, the author shows that the singular vectors of can be estimated via trace regression using matrix nuclear norm penalized least squares estimation (NNPLS). Under the assumption that the author shows that the principal angles of the subspace estimated using NNPLS are asymptotically normal. In [33], for i.i.d sub-Gaussian samples with population covariance matrix the authors estimate the first loading factor of using a Lasso type de-biased estimator. Under the assumption that and is sparse, i.e the authors prove that the Lasso type de-biased estimator is asymptotically normal.

2. Main results and methodology

In this section, we state our main results, and briefly summarize our proof strategy.

2.1. Notations

For a positive integer , we denote by the set . Let be the complex upper-half plane. Further, we define the following linearization for our model

(2.1)

where

(2.2)

In the sequel, we will often omit and simply write and when there is no confusion.

We denote the empirical spectral distributions (ESD) of the matrices and by

and are known to satisfy the Marchenko-Pastur (MP) law [46]. More precisely, almost surely, converges weakly to a non-random limit which has a density function given by

and has a point mass at the origin if , where and . Furthermore, the Stieltjes’s transform of is given by

(2.3)

where the square root denotes the complex square root with a branch cut on the negative real axis. Similarly, almost surely, converges weakly to a non-random limit which has a density function given by

and a point mass at the origin if . The corresponding Stieltjes’s transform is

(2.4)

In this paper, the singular values of are assumed to satisfy the supercritical condition.

Assumption 2.1 (Supercritical condition).

There exists a constant such that

Remark 2.2.

The first inequality above ensures that the first singular values of are outliers. The second inequality guarantees that the outliers of are well separated from each other. Both conditions can be weakened. For instance, we do allow the existence of the subcritical and critical ’s if we only focus on the outlier singular vectors of . Also, the separation of ’s by an order distance is not necessary. In [13], a much weaker separation of order is enough for the discussion of the eigenvalues. But we do not pursue these directions in the current paper.

To state our results, we need more notations. First, we define

(2.5)

For each , we will write for short. In [23, Theorem 3.4], it has been shown that is the limit of . Further, we set

(2.6)

It has been proved in [23] that and are the limits of and respectively (see Lemma 3.9 below). We also denote by the th cumulant of the random variables .

2.2. Main results

In this section, we state our main results.

For a vector and , we introduce the notation

Set

(2.7)

and

For the right singular vectors, we have the following theorem.

Theorem 2.3 (Right singular vectors).

Assume (1.2), (1.3), (1.6) and Assumption 2.1 hold. For , define the random variable

(2.8)

and let be a random variable, independent of , with law , where

Then for any and for any bounded continuous function , we have that

Remark 2.4.

In [36], the authors obtain the non-universality for the limiting distributions of the outliers (outlying eigenvalues) of the deformed Wigner matrices. The limiting distributions admit similar forms as the limiting distributing for the outlier singular vectors for our models. One might notice that the third and the fourth cumulants of the entries of the Wigner matrices are allowed to be different in [36]. An extension along this direction is also straightforward for our result.

We discuss a few special cases of interest. For simplicity, we assume that has rank and drop all the subindices.

Remark 2.5.

If the entries of are standard Gaussian random variables (i.e. ), then and thus is asymptotically distributed as

Remark 2.6.

If both and are delocalized in the sense that and , then and for

. We conclude from central limit theorem that

(2.9)

and therefore has asymptotically the same distribution as

The only difference from the Gaussian case is a shift caused by the non-vanishing third cumulant.

Remark 2.7.

If one of and is delocalized, say , then still has the limiting distribution in (2.9). Therefore has asymptotically the same distribution as a Gaussian random variable with mean

and variance

For two vectors and , we denote

Recall from (1.5). We have the following theorem.

Theorem 2.8 (Right singular subspace).

Assume (1.2), (1.3), (1.6) and Assumption 2.1 hold. Let , where is defined in (2.8). Let be a random variable independent of with law , where

Then for any bounded continuous function , we have that

Similarly, we set

and

For the left singular vectors, we have the following theorem.

Theorem 2.9 (Left singular vectors).

Assume (1.2), (1.3), (1.6) and Assumption 2.1 hold. For , define the random variable

(2.10)

and let be a random variable, independent of , with law , where

Then for any and any bounded continuous function , we have that

Next, we state the result on the asymptotic distribution of defined in (1.5).

Theorem 2.10 (Left singular subspace).

Assume (1.2), (1.3), (1.6) and Assumption 2.1 hold. Let , where is defined in (2.10). Let be a random variable independent of with law , where

Then for any bounded continuous function , we have that

2.3. Proof strategy

In this subsection, we briefly describe our proof strategy. We first review the method used in a related work [36], and then we highlight the novelty of our strategy.

As we previously mentioned, in [36], the authors derive the distribution of outliers (outlying eigenvalues) of the fixed-rank deformation of Wigner matrices. The main technical input is the isotropic local law for Wigner matrices, which provides a precise large deviation estimate for the quadratic form for any deterministic vectors . Here is a Wigner matrix. It turns out that an outlier of the deformed Wigner matrix can also be approximated by a quadratic form of the Green function, of the form . So one can turn to establish the law of the quadratic form of the Green function instead. In [36], the authors decompose the proof into three steps. First, the law is established for the GOE/GUE, the Gaussian Wigner matrix, for which orthogonal/unitary invariance of the matrix can be used to facilitate the proof. In the second step of going beyond Gaussian matrix, in order to capture the independence of the Gaussian part and the non-Gaussian part of the limiting distribution of the outliers, the authors construct an intermediate matrix in which most of the matrix entries are replaced by the Gaussian ones while those with coordinates corresponding to the large components of are kept as generally distributed. The intermediate matrix allows one to use the nice properties of the Gaussian ensembles such as orthogonal/unitary invariance for the major part of the matrix, and meanwhile keeps the non-Gaussianity induced by the small amount of generally distributed entries. In the last step, the authors of [36] derive the law for the fully generally distributed Wigner matrix by further conducting a Green function comparison with the intermediate matrix.

For our problem, similarly, we will use the isotropic law of the sample covariance matrix in [12, 37] as a main technical input. It turns out that for the singular vectors, we can approximately represent (after appropriate centralization) in terms of a quantity of the form

where is the Green function of the linearization of the sample covariance matrix and is the deterministic approximation of ; see (3.1) and (3.6) for the definitions. Here both and are deterministic fixed-rank matrices. Hence, differently from the outlying eigenvalues or singular values, the Green function representation of the singular vectors also contains the derivative of the Green function. More importantly, instead of the three step strategy in [36], here we derive the law of the above directly for generally distributed matrix. Recall defined in (2.8), whose random part is proportional to , which is simply a linear combination of the entries of . Inspired by [36], we decompose into two parts, say and . The former contains the linear combination of ’s for those indices corresponding to the large components and in and . The latter contains the linear combinations of the rest of ’s. Note that is asymptotically normal by CLT since the coefficients of ’s are small. However, may not be normal. The key idea of our strategy is to show the following recursive estimate: For any fixed , we have

(2.11)

for some positive number . Choosing , we can derive the asymptotic normality of for (2.11) by the recursive moment estimate. Choosing to be arbitrary, we can further deduce from (2.11)

Then asymptotic independence between and follows. Hence, we prove both the asymptotic normality and asymptotic independence from (2.11), and thus we kill two birds with one stone. The method of using the recursive estimate to get the large deviation bounds for Green function or some functional of the Green functions has been previously used in the context of the Random Matrix Theory. For instance, we refer to [42]. However, as far as we know, it is the first time to use the recursive estimate to show the normality and the independence simultaneously for the functionals of the Green functions.

Finally, we remark that the approach in this paper can also be applied to derive the distribution of the outlier eigenvectors of the spiked sample covariance matrix and the deformed Wigner matrix. We will consider these extensions in the future work (c.f. [6]).

2.4. Organization

The rest of the paper is organized as follows. In Section 3, we introduce some main technical results including the isotropic local law and also derive the Green function representation for our statistics. In Section 4, we prove Theorems 2.3 and 2.9, based on the recursive estimate in Proposition 4.2. Section 5 is then devoted to the proof of Proposition 4.2. In Section 6, we state the proof for a main technical lemma, Lemma 5.2, which is used in the proof of Proposition 4.2. In Section 7, we prove Theorems 2.8 and 2.10.

3. Techincal tools and Green function representations

This section is devoted to providing some basic notions and technical tools, which will be needed often in our proofs for the theorems. The basic notions are given in Section 3.1. A main technical input for our proof is the isotropic local law for the sample covariance matrix obtained in [12, 37]. It will be stated in Section 3.2. In subsection 3.3, we represent (asymptotically) ’s and ’s, and also and (c.f. (1.5)), in terms of the Green function. The discussion is based on the second author’s previous work [23], where the limits for and are studied. We then collect a few auxiliary lemmas in Section 3.4.

3.1. Basic notions

Our estimation relies on the local MP law [52] and its isotropic version [12, 37], which provide sharp large deviation estimates for the Green functions

Here we recall the definition in (2.2). By Schur complement, it is easy to derive

(3.1)

The Stieltjes transforms for the ESD of and are defined by

(3.2)

It is well-known that and have nonrandom approximates and , which are the Stieltjes transforms for the MP laws defined in (2.3) and (2.4). Specifically, for any fixed , the following hold almost surely,

Furthermore, one can easily check that and satisfy the following self-consistent equations

(3.3)
(3.4)

We can also derive the following simple relation from the definitions

(3.5)

Next we summarize some basic identities in the following lemma without proof. They can be checked from (2.3) and (2.4) via elementary calculations.

Lemma 3.1.

Denote in (2.5). For any we have

and

Furthermore, denote by . We have

In the sequel, we also need the following notion on high probability events.

Definition 3.2 (High probability event).

We say that an -dependent event holds with high probability if, for any large

for sufficiently large

We also adopt the notion of stochastic domination introduced in [26]. It provides a convenient way of making precise statements of the form “ is bounded by up to small powers of with high probability”.

Definition 3.3 (Stochastic domination).

Let

be two families of nonnegative random variables, where is a possibly -dependent parameter set. We say that is stochastically dominated by uniformly in if for all small and large we have

for large enough In addition, we use the notation if is stochastically dominated by uniformly in Throughout this paper, the stochastic domination will always be uniform in all parameters (mostly are matrix indices and the spectral parameter ) that are not explicitly fixed.

3.2. Isotropic local laws

The key ingredient in our estimation is a special case of the anisotropic local law derived in [37], which is essentially the isotropic local law previously derived in [12]. Set

(3.6)

We will need the isotropic local law outside the spectrum of the MP law. For define the spectral domain

(3.7)

where is a fixed small constant. Recall the notations and defined in (3.2).

Lemma 3.4 (Theorem 3.7 of [37], Theorem 3.12 of [12] and Theorem 3.1 of [52]).

Fix for any unit deterministic vectors we have