Long Random Matrices and Tensor Unfolding

In this paper, we consider the singular values and singular vectors of low rank perturbations of large rectangular random matrices, in the regime the matrix is "long": we allow the number of rows (columns) to grow polynomially in the number of columns (rows). We prove there exists a critical signal-to-noise ratio (depending on the dimensions of the matrix), and the extreme singular values and singular vectors exhibit a BBP type phase transition. As a main application, we investigate the tensor unfolding algorithm for the asymmetric rank-one spiked tensor model, and obtain an exact threshold, which is independent of the procedure of tensor unfolding. If the signal-to-noise ratio is above the threshold, tensor unfolding detects the signals; otherwise, it fails to capture the signals.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

04/30/2021

Spiked Singular Values and Vectors under Extreme Aspect Ratios

The behavior of the leading singular values and vectors of noisy low-ran...
05/27/2021

Entrywise Estimation of Singular Vectors of Low-Rank Matrices with Heteroskedasticity and Dependence

We propose an estimator for the singular vectors of high-dimensional low...
12/23/2021

When Random Tensors meet Random Matrices

Relying on random matrix theory (RMT), this paper studies asymmetric ord...
12/05/2017

Phase transition in the spiked random tensor with Rademacher prior

We consider the problem of detecting a deformation from a symmetric Gaus...
08/02/2021

A Random Matrix Perspective on Random Tensors

Tensor models play an increasingly prominent role in many fields, notabl...
12/22/2016

Statistical limits of spiked tensor models

We study the statistical limits of both detecting and estimating a rank-...
01/17/2019

A Relation Between Weight Enumerating Function and Number of Full Rank Sub-matrices

In most of the network coding problems with k messages, the existence of...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

High order arrays, or tensors have been actively considered in neuroimaging analysis, topic modeling, signal processing and recommendation system [25, 19, 26, 36, 50, 58, 55, 18, 54]. Researchers have made tremendous efforts to innovate effective methods for the analysis of tensor data. The spiked tensor model, introduced in [51]

by Richard and Montanari, captures a number of statistical estimation tasks in which we need to extract information from a noisy high-dimensional data tensor. We are given a tensor

in the following form

where is a noise tensor, corresponds to the signal-to-noise ratio (SNR), and is a rank one unseen signal tensor to be recovered.

When , the spiked tensor model reduces to the spiked matrix model of the form “signal

noise”, which has been intensively studied in the past twenty years. It is now well-understood that the extreme eigenvalues of low rank perturbations of large random matrices undergo a so-called BBP phase transition

[3] along with the change of signal-to-noise ratio, first discovered by Baik, Péché and the first author. There is an order one critical signal-to-noise ratio , such that below , it is information-theoretical impossible to detect the spikes [49, 46, 45], and above

, it is possible to detect the spikes by Principal Component Analysis (PCA). A body of work has quantified the behavior of PCA in this setting

[33, 3, 4, 47, 11, 2, 34, 12, 15, 43, 56, 14, 23, 38, 22]. We refer readers to the review article [35] by Johnstone and Paul, for more discussion and references to this and related lines of work.

For the spiked tensor model with , the same as the spiked matrix model, there is an order one critical signal-to-noise ratio (depending on the order ), such that below , it is information-theoretical impossible to detect the spikes, and above , the maximum likelihood estimator is a distinguishing statistics [16, 17, 39, 48, 32]

. In the matrix setting the maximum likelihood estimator is the top eigenvector, which can be computed in polynomial time by, e.g., power iteration. However, for order

tensor, computing the maximum likelihood estimator is NP-hard in generic setting. In this setting, two “phase transitions” are often studied. There is the critical signal-to-noise ratio SNR, below which it is statistically impossible to estimate the parameters. Although above the threshold SNR, it is possible to estimate the parameters in theory, there is no known efficient algorithm (polynomial time) to achieve recovery close to the statistical threshold SNR. Thus many algorithm development fields are interested in another critical threshold SNRSNR below which it is impossible for an efficient algorithm to achieve recovery. For the spiked tensor model with , it is widely believed there exists a computational-to-statistical gaps SNRSNR. We refer readers to the article [5] by Bandeira, Perry and Wein, for more detailed discussion on this phenomenon.

In the work [51] of Richard and Montanari, the algorithmic aspects of the spiked tensor model have been studied. They showed that tensor power iteration and approximate message passing algorithm with random initialization recovers the signal provided

. Based on heuristic arguments, they predicted that the necessary and sufficient condition for power iteration and approximate message passing (AMP) algorithm to succeed is

. This threshold was proven in [39] for AMP by Lesieur, Miolane, Lelarge, Krzakala and Zdeborová, for power iteration by Yang, Cheng and the last two authurs [31]. The same threshold was also achieved by using gradient descent and Langevin dynamics as studied by Gheissari, Jagannath and the first author [7].

In [51], Richard and Montanari also proposed a method based on tensor unfolding, which unfolds the tensor to an matrix for some ,

(1.1)

By taking and performing matrix PCA on , they proved that the tensor unfolding algorithm reovers the signal provided , and predicted that the necessary and sufficient condition for tensor unfolding is .

Several other sophisticated algorithms for the spiked tensor model have been investigated in literature which achieves the sharp threshold : Sum-of-Squares algorithms [29, 28, 37], sophisticated iteration algorithms [40, 57, 27], and an averaged version of gradient descent [13] by Biroli, Cammarota, Ricci-Tersenghi. The necessary part of this threshold still remains open. Its relation with hypergraphic planted clique problem was discussed in [41, 42] by Luo and Zhang. Its proven for in [29] by Hopkins, Shi and Steurer, degree- sum-of-squares relaxations fail below this threshold. The candscape complexity of spiked tensor model was studied in [8] by Mei, Montanari, Nica and the first author. A new framework based on the Kac-Rice method that allows to compute the annealed complexity of the landscape has been proposed in [52] by Ros, Biroli,Cammarota and the first author, which was later used to analyze gradient-based algorithms in non-convex setting [53, 44] by Mannelli, Biroli, Cammarota, Krzakala, Urbani, and Zdeborová.

In this paper, we revisit the tensor unfolding algorithm introduced by Richard and Montanari. The unfolded matrix from (1.1) is a spiked matrix model in the form of “signal

noise”. However, it is different from spiked matrix models in random matrix literature, which requires the dimensions of the matrix to be comparable, namely the ratio of the number of rows and the number of columns converges to a fixed constant. In this setting, the singular values and singular vectors of the spiked matrix model (

1.1) has been studied in [11] by Benaych-Georges and Nadakuditi.

For the unfolded matrix , its dimensions are not comparable. As the size of the tensor goes to infinity, the ratio of the number of rows and columns goes to zero or infinity, unless (in this case is a square matrix). In this paper, we study the singular values and singular vectors of the spiked matrix model (1.1) in the case where the number of rows (columns) grows polynomially in the number of columns (rows), which we call low rank perturbation of long random matrices. In the case when , the estimates of singular values and singular vectors for long random matrices follow from [1] by Alex, Erdős, Knowles, Yau, and Yin.

To study the low rank perturbations of long random matrices, we use the master equations from [11]

, which characterize the outliers of the perturbed random matrices, and the associated singular values. To analyze the master equations, we use the estimates of singular values and Green’s functions of long random matrices from

[1]. Comparing with the setting that the ratio of the number of rows and the number of columns converges to a fixed constant, the challenge is to obtain uniform estimates for the errors in the master equations, which depends only on the number of rows. In this way, we can allow the number of columns to grow much faster than the number of rows. For the low rank perturbation of long random matrices, we prove there exists a critical signal-to-noise ratio (depending on the dimensions of the matrix), and it exhibit a BBP type phase transition. We also obtain estimates of the singular values and singular vectors for this model. Moreover, we also have precise estimates when the signal-to-noise ratio is close to the threshold . In particular, our results also apply when is close to in mesoscopic scales. In an independent work [24] by Feldman, this model has been studied under different assumptions. We refer to Section 2.2 for a more detailed discussion of the differences.

Our results for low rank perturbation of long random matrices can be used to study the unfolded matrix . For the signal-to-noise ratio and any , if , the PCA on detects the signal tensor; if , the PCA on fails to capture the signal tensor. This matches the conjectured algorithmic threshold for the spiked tensor model from [51]. It is worth mentioning that the threshold we get is independent of the tensor unfolding procedure, namely, it is independent of the choice of . For , a further recursive unfolding is needed to recover individual signals , which increases the computational cost. We propose to simply take in the tensor unfolding algorithm for each coordinate axis, and unfold the tensor to an matrix, which gives good approximation of individual signals , provided

. In tensor literature, this algorithm is exactly the truncated higher order singular value decomposition (HOSVD) introduced in

[20] by De Lathauwer, De Moor and Vandewalle. Later, they developed the higher order orthogonal iteration (HOOI) in [21], which uses the truncated HOSVD as initialization combining with a power iteration, to find the best low-multilinear-rank approximation of a tensor. The performance of HOOI was analyzed in [57] by Zhang and Xia, for the spiked tensor model. It was proven that for the signal-to-noise ratio satisfies for some large constant , HOOI converges within a logarithm factor of iterations.

The paper is organized as follows. In Section 2, we state the main results on the singular values and vectors of low rank perturbations of long random matrices. In Section 3 we study the spiked tensor model, as an application of our results on low rank perturbations of long random matrices. The proofs of our main results are given in Section 4.

Notations For two numbers , we write that if there exists a universal constant such that . We write , or if the ratio as goes to infinity. We write if there exists a universal constant such that . We denote the index set . We say an event

holds with high probability if for any large

, for large enough. We write or , if is stochastically dominated by in the sense that for all large , we have

for large enough .

Acknowledgements The research of J.H. is supported by the Simons Foundation as a Junior Fellow at the Simons Society of Fellows, and NSF grant DMS-2054835.

2 Main Results

In this section, we state our main results on the singular values and singular vectors of low rank perturbations of long random matrices. Let be an random matrices, with i.i.d. random entries satisfying

Assumption 2.1.

The entries of

are i.i.d., and have mean zero and variance

, and for any integer ,

We introduce the parameter . It is well-known that for , if their ratio converges to which is independent of , the empirical eigenvalue distribution of converges to the Marchenko-Pastur law

(2.1)

supported on .

In this paper, we consider the case where the ratio depends on , satisfying that for some constant . In Section 2.1, we collect some results on singular values of the long random matrix from [1]. Using the estimates of the singular values and singular vectors of long random matrices as input, we can study low rank perturbations of long random matrices. We consider rank one perturbation of long random matrices in the form,

(2.2)

where , are unit vectors, and is an random matrix satisfying Assumption 2.1. We state our main results on the singular values and singular vectors of low rank perturbations of long random matrices (2.2) in Section 2.2.

2.1 Singular values of Long Random Matrices

Let be an random matrix, with entries satisfying Assumption 2.1. Let satisfy that for some constant . The eigenvalues of sample covariance matrices in this setting have been well studied in [1]. We denote the singular values of as

They are square roots of eigenvalues of the sample covariance matrices . As an easy consequence of [1, Theorem 2.10], we have the following theorem on the estimates of the largest singular value of .

Theorem 2.2.

Under Assumption 2.1, let with . Fix an arbitrarily small , with high probability the largest singular value of satisfies

(2.3)

provided is large enough.

The results in [1] give estimates of each eigenvalues of the sample covariance matrix away from , see Theorem 4.1. It also gives estimates of locations of each singular value of . In particular, it implies that the empirical singular value distribution of is close to the pushforward of the Marchenko-Pastur law (after proper normalization) by the map ,

We remark that depends on through and is supported on . As with , after shifting by , i.e. , converges to the semi-circle distribution on :

See Figure 1 for some plots of . One can see that the extreme singular values stick to the boundary of the support of the limiting empirical measure.

Figure 1: The empirical singular value distribution of , with and . When

is large, the empirical singular value converges to the Semicircle distribution supported on

.

2.2 Low Rank Perturbations of Long Random Matrices

Let be an random matrix, with entries satisfying Assumption 2.1. Let . Without loss of generality we can assume that , otherwise, we can simply study the transpose of . We allow to grow with at any polynomial rate, for any large number . In this regime, is a long random matrix.

In this section, we state our main results on the rank one perturbation of long random matrices from (2.2):

(2.4)

where , are unit vectors. As in Theorem 2.2, in this setting, the singular values of are roughly supported on the interval . The following theorem states that there is an exact -dependent threshold , if is above the threshold, has an outlier singular value; if is below this threshold, there are no outlier singular values, and all the singular values are stick to the bulk.

Theorem 2.3.

We assume Assumption 2.1 and . Let with , fix arbitrarily small , and denote the largest singular value of . For any small , if , with high probability, the largest singular value of is an outlier, and explicitly given by

(2.5)

If , with high probability, does not have outlier singular values, and the largest singular value satisfies

(2.6)

provided is large enough.

We refer to Figure 2 for an illustration of Theorem 2.3. Theorem 2.3 also characterizes the behavior of the outlier in the critical case, when is close to .

Figure 2: The empirical singular value distribution of , with ,, and for . We marked the predicted outlier by as given by formula (2.5).

We have similar transition for the singular vectors. If is above the threshold , the left singular vector associated with the largest singular value of has a large component in the signal direction; If is below the threshold , the projection of on the signal direction vanishes.

Theorem 2.4.

We assume Assumption 2.1 and . Let with , fix arbitrarily small , and denote the largest singular value of . For any small , if , with high probability, the left singular vector associated with the largest singular value of has a large component in the signal direction:

(2.7)

And similar estimates for the right singular vector associated with the largest singular value of

(2.8)

If , with high probability, the projection of on , and the projection of on are upper bounded by

(2.9)

provided is large enough.

Remark 2.5.

We remark that when the ratio goes to infinity with , with fixed, then from (2.7), converges to , and from (2.8), converges to .

Remark 2.6.

In this paper, for simplicity of notations, we only consider rank one perturbations of long random matrices. Our method can as well be used to study any finite rank perturbations of long random matrices.

The singular values and vectors of low rank perturbations of large rectangular random matrices has previously been studied in [11] by Benaych-Georges and Nadakuditi. Our main results Theorems 2.3 and 2.4 are generalization of their results in two directions. Firstly, in [11], the ratio is assumed to converge to a constant independent of . In our setting, we allow to have polynomial growth in . As we will see in Section 3, this will be crucial for us to study the tensor principle component analysis. Secondly, we allow the signal-to-noise ratio to be close to the threshold (depending on ), and our main results characterize the behaviors of singular values and singular vectors in this regime. In an independent work [24] by Feldman, the singular values and vectors of multi-rank perturbations of long random matrices have been studied under the assumption that either the signal vectors contains i.i.d. entries with mean zero and variance one, or the noise matrix has Gaussian entries. Both proofs use the master equations which characterize the singular values and singular vectors of low rank perturbations of rectangular random matrices developed in [11], see Section 4.2. To analyze the master equation, in [24], Feldman needs that the signal vectors have i.i.d. entries with mean zero and variance one. Our argument uses results from [1], which gives Green’s function estimates of long random matrices, and works for any (deterministic) signal vectors.

3 Tensor PCA

As an application of our main results Theorems 2.3 and 2.4, in this section, we use them to study the asymmetric rank-one spiked tensor model as introduced in [51]:

(3.1)

where

  • is the -th order tensor observation.

  • is a noise tensor. The entries of

    are independent random variables with mean zero and variance

    .

  • is the signal-to-noise ratio.

  • are unknown unit vectors to be recovered.

The goal is to perform reliable estimation and inference on the unseen signal tensor . We remark that for any rank-one tensor , it can be uniquely written as , where are unit vectors. The model (3.1) is slightly more general than the asymmetric spiked tensor model in [51], which assumes that . In this section, we make the following assumption on the noise tensor:

Assumption 3.1.

The entries of are i.i.d., and they have mean zero and variance : for any indices , and any integer ,

The tensor unfolding algorithm in [51] unfolds the tensor to an matrix, and they proved that it detects the signal when the signal-to-noise ratio satisfies . The conjectured algorithmic threshold is . We recall the tensor unfolding algorithm. Take any index set , with . Given any , let , we denote the matrix , which is the matrix obtained from by unfolding along the axes indexed by . More precisely, for any indices , let and , then

If is a singleton, i.e. , we will simply write as .

We can view as the sum of the unfolding of the signal tensor and the noise tensor . Let

Then we can rewrite as

(3.2)

To make (3.2) in the form of (2.2), we need to further normalize as

(3.3)

In this way, each entry of has variance . And (3.3) is a rank one perturbation of a random matrix in the form of (2.2) (by taking as ), and the ratio grows at most polynomially in . We take the largest singular value of the normalized unfolded matrix , our main results Theorems 2.3 and 2.4 indicate that there is a phase transition at for the tensor unfolding (3.3).

Theorem 3.2.

We assume Assumption 3.1, and fix any index set with . Let be the normalized matrix obtained from by unfolding along the axes indexed by , as in (3.3), and denote the ratio .

Let with , fix arbitrarily small , and denote the largest singular value of , and the corresponding left singular vector. For arbitrarily small , if , with high probability, the largest singular value is an outlier, is explicitly given by

and the left singular vector has a large component in the direction:

(3.4)

If , with high probability, does not have outlier singular values, the largest singular value satisfies

and the projection of on is upper bounded by

(3.5)

provided is large enough.

Proof of Theorem 3.2.

Theorem 3.2 follows from Theorems 2.3 and 2.3 by taking as , and the signal-to-noise ratio as . In this way the criteria in Theorems 2.3 and 2.3 become that if

then is an outlier with high probability and the left singular vector has a large component in the direction. Otherwise if

sticks to the bulk and the projection of on is small. ∎

We remark that as indicated by Theorem 3.2, the tensor unfolding algorithms which unfold to an matrix for any choice of and index set , essentially share the same threshold, i.e. , which matches the conjectured threshold. As in (3.4), above the signal-to-noise ratio threshold, the left singular vector corresponding to the largest singular value of is aligned with . The leading order accuracy for the estimator as in (3.4) is the same and is independent of . However, if , this does not give us information of individual signal . A further recursive unfolding method is proposed in [51] to recover each individual signal .

Since by taking does change the algorithmic threshold, but increasing the computation cost. We propose the following simple algorithm to recover each signal , by performing the tensor unfolding algorithm for each with , namely .

For each , we unfold (3.1) to an matrix

(3.6)

which is (3.3) by taking and . In this way (3.6) is a rank one perturbation of a long random matrix in the form of (2.2), and the ratio grows with . We take the largest singular value of , and denote

(3.7)

as the estimator for ; and the left singular vector corresponding to the largest singular value of , as the estimator for . This gives the following simple algorithm to recover and .

Input: ;
for  from to  do
       largest singular value of ;
       left singular vector corresponding to the largest singular value of ;
       ;
      
end for
Result: .
Algorithm 1 Tensor Unfolding

In tensor literature, the above algorithm is exactly the truncated higher order singular value decomposition (HOSVD) introduced in [20]. The higher order orthogonal iteration (HOOI), which uses the truncated HOSVD as initialization combining with a power iteration, was developed in [21] to find a best low-multilinear-rank approximation of a tensor. The performance of HOOI was analyzed in [57] for the spiked tensor model. It was proven that for the signal-to-noise ratio with for some large constant , HOOI converges within a logarithm factor of iterations. As an easy consequence of Theorem 3.2, we have the following theorem which gives the exact threshold of the signal-to-noise ratio, i.e. . Above the threshold, our estimators and approximate the signal-to-noise ratio and the signal vector .

Theorem 3.3.

We assume Assumption 3.1. For any , we unfold to as in (3.6). Let the estimator be as defined in (3.7), and the left singular vector corresponding to the largest singular value of .

Let with , and fix arbitrarily small . For arbitrarily small , if , with high probability, and approximates and

(3.8)

and

(3.9)

If , with high probability, the projection of on vanishes as decreases

(3.10)

provided is large enough.

Remark 3.4.

Given the unfolded matrix , the largest singular value and its left eigenvector can be computed by first computing , then computing the the largest eigenvalue and corresponding eigenvector by power iteration. The total time complexity is , where is the number of iterations, which can be taken as . Therefore, the estimators and can be computed with time complexity . To recover the signals for each , we need to repeat the above tensor unfolding algorithm times, and obtain for each . The total time complexity is .

Proof.

The claims (3.9) and (3.10) follow directly from (3.4) and (3.5) by taking and . In the following we prove (3.8). Fix arbitrarily small . For , with high probability, the largest singular value of is given by (2.5)

(3.11)

where . Our as defined in (3.7) is chosen as the solution of

(3.12)

By taking difference of (3.11) and (3.12), and rearranging, we get

(3.13)

We notice that . Thus (3.13) implies

with high probability, provided is large enough. This finishes the proof (3.8). ∎

We numerically verify Theorem 3.2. We take , and for . We sample the signals as unit Gaussian vectors, and the noise tensor with independent Gaussian entries. In the left panel of Figure 3, we plot the largest singular value of the unfolded matrix and our theoretical prediction (2.5). In the right panel of Figure 3 we plot and our estimator as in (3.7). The estimator provides a good approximation of provided that . In Figure (4) we plot , where the estimator is given as the left singular vector corresponding to the largest singular value of the unfolded matrix. Our theoretical prediction (blue curve) as in (3.9) matches well with the the simulation for . For , our estimator behaves as poorly as random guess, i.e. taking as a random Gaussian vector (Green curve). For in a small neighborhood of , we don’t have a good estimation of , but only an upper bound (3.10). In the second panel of Figure 4, we zoom in around , the red curve, corresponding to the bound (3.10), provides a good upper bound of .

Figure 3: For , , with , averaging over trials, the left panel plots the largest singular value of the unfolded matrix and our theoretical prediction; the left panel plots the and the estimator .
Figure 4: This plot of is for , with , averaging over trials.

4 Low rank Perturbations of Long Random Matrices

In this section we prove our main results as stated in Section 2. The proof of Theorem 2.2 is given in Section 4.1. We also collect the isotropic local law for long random matrices from [1]. In section 4.2, we derive a master equation which characterizes the outliers of the perturbed matrix . The master equation has been used intensively to study the low rank perturbation of random matrices, for both singular values and eigenvalues, see [10, 11, 9, 30]. The proofs of Theorems 2.3 and 2.4 are given in Sections 4.3 and 4.4 respectively.

4.1 Long Random Matrices

Let be an random matrix, with entries satisfying Assumption 2.1. Let , with . In this section we recall some results of the sample covariance matrices in this setting from [1]. It turns out in this setting the correct normalization is to study , which corresponds to the standard sample covariance matrices, with variance .

We denote the following -dependent Marchenko-Pastur law corresponding to the ratio ,

(4.1)

and its -quanties as

(4.2)

The normalization in (4.1) is different from that in (2.1), which corresponds to the sample covariance matrix . We remark that both the Marchenko-Pastur law

and its quantiles

depend on through . We recall the following eigenvalue rigidity result from [1, Theorem 2.10].

Theorem 4.1 (Eigenvalue Rigidity).

Under Assumption 2.1, let with , the eigenvalues of are close to the quantiles of the Marchenko-Pastur law (4.1): fix any and arbitrarily small , with high probability it holds

uniformly for all . If in addition for some constant , then with high probability, we also have