1 Introduction
High order arrays, or tensors have been actively considered in neuroimaging analysis, topic modeling, signal processing and recommendation system [25, 19, 26, 36, 50, 58, 55, 18, 54]. Researchers have made tremendous efforts to innovate effective methods for the analysis of tensor data. The spiked tensor model, introduced in [51]
by Richard and Montanari, captures a number of statistical estimation tasks in which we need to extract information from a noisy highdimensional data tensor. We are given a tensor
in the following formwhere is a noise tensor, corresponds to the signaltonoise ratio (SNR), and is a rank one unseen signal tensor to be recovered.
When , the spiked tensor model reduces to the spiked matrix model of the form “signal
noise”, which has been intensively studied in the past twenty years. It is now wellunderstood that the extreme eigenvalues of low rank perturbations of large random matrices undergo a socalled BBP phase transition
[3] along with the change of signaltonoise ratio, first discovered by Baik, Péché and the first author. There is an order one critical signaltonoise ratio , such that below , it is informationtheoretical impossible to detect the spikes [49, 46, 45], and above, it is possible to detect the spikes by Principal Component Analysis (PCA). A body of work has quantified the behavior of PCA in this setting
[33, 3, 4, 47, 11, 2, 34, 12, 15, 43, 56, 14, 23, 38, 22]. We refer readers to the review article [35] by Johnstone and Paul, for more discussion and references to this and related lines of work.For the spiked tensor model with , the same as the spiked matrix model, there is an order one critical signaltonoise ratio (depending on the order ), such that below , it is informationtheoretical impossible to detect the spikes, and above , the maximum likelihood estimator is a distinguishing statistics [16, 17, 39, 48, 32]
. In the matrix setting the maximum likelihood estimator is the top eigenvector, which can be computed in polynomial time by, e.g., power iteration. However, for order
tensor, computing the maximum likelihood estimator is NPhard in generic setting. In this setting, two “phase transitions” are often studied. There is the critical signaltonoise ratio SNR, below which it is statistically impossible to estimate the parameters. Although above the threshold SNR, it is possible to estimate the parameters in theory, there is no known efficient algorithm (polynomial time) to achieve recovery close to the statistical threshold SNR. Thus many algorithm development fields are interested in another critical threshold SNRSNR below which it is impossible for an efficient algorithm to achieve recovery. For the spiked tensor model with , it is widely believed there exists a computationaltostatistical gaps SNRSNR. We refer readers to the article [5] by Bandeira, Perry and Wein, for more detailed discussion on this phenomenon.In the work [51] of Richard and Montanari, the algorithmic aspects of the spiked tensor model have been studied. They showed that tensor power iteration and approximate message passing algorithm with random initialization recovers the signal provided
. Based on heuristic arguments, they predicted that the necessary and sufficient condition for power iteration and approximate message passing (AMP) algorithm to succeed is
. This threshold was proven in [39] for AMP by Lesieur, Miolane, Lelarge, Krzakala and Zdeborová, for power iteration by Yang, Cheng and the last two authurs [31]. The same threshold was also achieved by using gradient descent and Langevin dynamics as studied by Gheissari, Jagannath and the first author [7].In [51], Richard and Montanari also proposed a method based on tensor unfolding, which unfolds the tensor to an matrix for some ,
(1.1) 
By taking and performing matrix PCA on , they proved that the tensor unfolding algorithm reovers the signal provided , and predicted that the necessary and sufficient condition for tensor unfolding is .
Several other sophisticated algorithms for the spiked tensor model have been investigated in literature which achieves the sharp threshold : SumofSquares algorithms [29, 28, 37], sophisticated iteration algorithms [40, 57, 27], and an averaged version of gradient descent [13] by Biroli, Cammarota, RicciTersenghi. The necessary part of this threshold still remains open. Its relation with hypergraphic planted clique problem was discussed in [41, 42] by Luo and Zhang. Its proven for in [29] by Hopkins, Shi and Steurer, degree sumofsquares relaxations fail below this threshold. The candscape complexity of spiked tensor model was studied in [8] by Mei, Montanari, Nica and the first author. A new framework based on the KacRice method that allows to compute the annealed complexity of the landscape has been proposed in [52] by Ros, Biroli,Cammarota and the first author, which was later used to analyze gradientbased algorithms in nonconvex setting [53, 44] by Mannelli, Biroli, Cammarota, Krzakala, Urbani, and Zdeborová.
In this paper, we revisit the tensor unfolding algorithm introduced by Richard and Montanari. The unfolded matrix from (1.1) is a spiked matrix model in the form of “signal
noise”. However, it is different from spiked matrix models in random matrix literature, which requires the dimensions of the matrix to be comparable, namely the ratio of the number of rows and the number of columns converges to a fixed constant. In this setting, the singular values and singular vectors of the spiked matrix model (
1.1) has been studied in [11] by BenaychGeorges and Nadakuditi.For the unfolded matrix , its dimensions are not comparable. As the size of the tensor goes to infinity, the ratio of the number of rows and columns goes to zero or infinity, unless (in this case is a square matrix). In this paper, we study the singular values and singular vectors of the spiked matrix model (1.1) in the case where the number of rows (columns) grows polynomially in the number of columns (rows), which we call low rank perturbation of long random matrices. In the case when , the estimates of singular values and singular vectors for long random matrices follow from [1] by Alex, Erdős, Knowles, Yau, and Yin.
To study the low rank perturbations of long random matrices, we use the master equations from [11]
, which characterize the outliers of the perturbed random matrices, and the associated singular values. To analyze the master equations, we use the estimates of singular values and Green’s functions of long random matrices from
[1]. Comparing with the setting that the ratio of the number of rows and the number of columns converges to a fixed constant, the challenge is to obtain uniform estimates for the errors in the master equations, which depends only on the number of rows. In this way, we can allow the number of columns to grow much faster than the number of rows. For the low rank perturbation of long random matrices, we prove there exists a critical signaltonoise ratio (depending on the dimensions of the matrix), and it exhibit a BBP type phase transition. We also obtain estimates of the singular values and singular vectors for this model. Moreover, we also have precise estimates when the signaltonoise ratio is close to the threshold . In particular, our results also apply when is close to in mesoscopic scales. In an independent work [24] by Feldman, this model has been studied under different assumptions. We refer to Section 2.2 for a more detailed discussion of the differences.Our results for low rank perturbation of long random matrices can be used to study the unfolded matrix . For the signaltonoise ratio and any , if , the PCA on detects the signal tensor; if , the PCA on fails to capture the signal tensor. This matches the conjectured algorithmic threshold for the spiked tensor model from [51]. It is worth mentioning that the threshold we get is independent of the tensor unfolding procedure, namely, it is independent of the choice of . For , a further recursive unfolding is needed to recover individual signals , which increases the computational cost. We propose to simply take in the tensor unfolding algorithm for each coordinate axis, and unfold the tensor to an matrix, which gives good approximation of individual signals , provided
. In tensor literature, this algorithm is exactly the truncated higher order singular value decomposition (HOSVD) introduced in
[20] by De Lathauwer, De Moor and Vandewalle. Later, they developed the higher order orthogonal iteration (HOOI) in [21], which uses the truncated HOSVD as initialization combining with a power iteration, to find the best lowmultilinearrank approximation of a tensor. The performance of HOOI was analyzed in [57] by Zhang and Xia, for the spiked tensor model. It was proven that for the signaltonoise ratio satisfies for some large constant , HOOI converges within a logarithm factor of iterations.The paper is organized as follows. In Section 2, we state the main results on the singular values and vectors of low rank perturbations of long random matrices. In Section 3 we study the spiked tensor model, as an application of our results on low rank perturbations of long random matrices. The proofs of our main results are given in Section 4.
Notations For two numbers , we write that if there exists a universal constant such that . We write , or if the ratio as goes to infinity. We write if there exists a universal constant such that . We denote the index set . We say an event
holds with high probability if for any large
, for large enough. We write or , if is stochastically dominated by in the sense that for all large , we havefor large enough .
Acknowledgements The research of J.H. is supported by the Simons Foundation as a Junior Fellow at the Simons Society of Fellows, and NSF grant DMS2054835.
2 Main Results
In this section, we state our main results on the singular values and singular vectors of low rank perturbations of long random matrices. Let be an random matrices, with i.i.d. random entries satisfying
Assumption 2.1.
We introduce the parameter . It is wellknown that for , if their ratio converges to which is independent of , the empirical eigenvalue distribution of converges to the MarchenkoPastur law
(2.1) 
supported on .
In this paper, we consider the case where the ratio depends on , satisfying that for some constant . In Section 2.1, we collect some results on singular values of the long random matrix from [1]. Using the estimates of the singular values and singular vectors of long random matrices as input, we can study low rank perturbations of long random matrices. We consider rank one perturbation of long random matrices in the form,
(2.2) 
where , are unit vectors, and is an random matrix satisfying Assumption 2.1. We state our main results on the singular values and singular vectors of low rank perturbations of long random matrices (2.2) in Section 2.2.
2.1 Singular values of Long Random Matrices
Let be an random matrix, with entries satisfying Assumption 2.1. Let satisfy that for some constant . The eigenvalues of sample covariance matrices in this setting have been well studied in [1]. We denote the singular values of as
They are square roots of eigenvalues of the sample covariance matrices . As an easy consequence of [1, Theorem 2.10], we have the following theorem on the estimates of the largest singular value of .
Theorem 2.2.
Under Assumption 2.1, let with . Fix an arbitrarily small , with high probability the largest singular value of satisfies
(2.3) 
provided is large enough.
The results in [1] give estimates of each eigenvalues of the sample covariance matrix away from , see Theorem 4.1. It also gives estimates of locations of each singular value of . In particular, it implies that the empirical singular value distribution of is close to the pushforward of the MarchenkoPastur law (after proper normalization) by the map ,
We remark that depends on through and is supported on . As with , after shifting by , i.e. , converges to the semicircle distribution on :
See Figure 1 for some plots of . One can see that the extreme singular values stick to the boundary of the support of the limiting empirical measure.
2.2 Low Rank Perturbations of Long Random Matrices
Let be an random matrix, with entries satisfying Assumption 2.1. Let . Without loss of generality we can assume that , otherwise, we can simply study the transpose of . We allow to grow with at any polynomial rate, for any large number . In this regime, is a long random matrix.
In this section, we state our main results on the rank one perturbation of long random matrices from (2.2):
(2.4) 
where , are unit vectors. As in Theorem 2.2, in this setting, the singular values of are roughly supported on the interval . The following theorem states that there is an exact dependent threshold , if is above the threshold, has an outlier singular value; if is below this threshold, there are no outlier singular values, and all the singular values are stick to the bulk.
Theorem 2.3.
We assume Assumption 2.1 and . Let with , fix arbitrarily small , and denote the largest singular value of . For any small , if , with high probability, the largest singular value of is an outlier, and explicitly given by
(2.5) 
If , with high probability, does not have outlier singular values, and the largest singular value satisfies
(2.6) 
provided is large enough.
We refer to Figure 2 for an illustration of Theorem 2.3. Theorem 2.3 also characterizes the behavior of the outlier in the critical case, when is close to .
We have similar transition for the singular vectors. If is above the threshold , the left singular vector associated with the largest singular value of has a large component in the signal direction; If is below the threshold , the projection of on the signal direction vanishes.
Theorem 2.4.
We assume Assumption 2.1 and . Let with , fix arbitrarily small , and denote the largest singular value of . For any small , if , with high probability, the left singular vector associated with the largest singular value of has a large component in the signal direction:
(2.7) 
And similar estimates for the right singular vector associated with the largest singular value of
(2.8) 
If , with high probability, the projection of on , and the projection of on are upper bounded by
(2.9) 
provided is large enough.
Remark 2.5.
Remark 2.6.
In this paper, for simplicity of notations, we only consider rank one perturbations of long random matrices. Our method can as well be used to study any finite rank perturbations of long random matrices.
The singular values and vectors of low rank perturbations of large rectangular random matrices has previously been studied in [11] by BenaychGeorges and Nadakuditi. Our main results Theorems 2.3 and 2.4 are generalization of their results in two directions. Firstly, in [11], the ratio is assumed to converge to a constant independent of . In our setting, we allow to have polynomial growth in . As we will see in Section 3, this will be crucial for us to study the tensor principle component analysis. Secondly, we allow the signaltonoise ratio to be close to the threshold (depending on ), and our main results characterize the behaviors of singular values and singular vectors in this regime. In an independent work [24] by Feldman, the singular values and vectors of multirank perturbations of long random matrices have been studied under the assumption that either the signal vectors contains i.i.d. entries with mean zero and variance one, or the noise matrix has Gaussian entries. Both proofs use the master equations which characterize the singular values and singular vectors of low rank perturbations of rectangular random matrices developed in [11], see Section 4.2. To analyze the master equation, in [24], Feldman needs that the signal vectors have i.i.d. entries with mean zero and variance one. Our argument uses results from [1], which gives Green’s function estimates of long random matrices, and works for any (deterministic) signal vectors.
3 Tensor PCA
As an application of our main results Theorems 2.3 and 2.4, in this section, we use them to study the asymmetric rankone spiked tensor model as introduced in [51]:
(3.1) 
where

is the th order tensor observation.

is the signaltonoise ratio.

are unknown unit vectors to be recovered.
The goal is to perform reliable estimation and inference on the unseen signal tensor . We remark that for any rankone tensor , it can be uniquely written as , where are unit vectors. The model (3.1) is slightly more general than the asymmetric spiked tensor model in [51], which assumes that . In this section, we make the following assumption on the noise tensor:
Assumption 3.1.
The entries of are i.i.d., and they have mean zero and variance : for any indices , and any integer ,
The tensor unfolding algorithm in [51] unfolds the tensor to an matrix, and they proved that it detects the signal when the signaltonoise ratio satisfies . The conjectured algorithmic threshold is . We recall the tensor unfolding algorithm. Take any index set , with . Given any , let , we denote the matrix , which is the matrix obtained from by unfolding along the axes indexed by . More precisely, for any indices , let and , then
If is a singleton, i.e. , we will simply write as .
We can view as the sum of the unfolding of the signal tensor and the noise tensor . Let
Then we can rewrite as
(3.2) 
To make (3.2) in the form of (2.2), we need to further normalize as
(3.3) 
In this way, each entry of has variance . And (3.3) is a rank one perturbation of a random matrix in the form of (2.2) (by taking as ), and the ratio grows at most polynomially in . We take the largest singular value of the normalized unfolded matrix , our main results Theorems 2.3 and 2.4 indicate that there is a phase transition at for the tensor unfolding (3.3).
Theorem 3.2.
We assume Assumption 3.1, and fix any index set with . Let be the normalized matrix obtained from by unfolding along the axes indexed by , as in (3.3), and denote the ratio .
Let with , fix arbitrarily small , and denote the largest singular value of , and the corresponding left singular vector. For arbitrarily small , if , with high probability, the largest singular value is an outlier, is explicitly given by
and the left singular vector has a large component in the direction:
(3.4) 
If , with high probability, does not have outlier singular values, the largest singular value satisfies
and the projection of on is upper bounded by
(3.5) 
provided is large enough.
Proof of Theorem 3.2.
Theorem 3.2 follows from Theorems 2.3 and 2.3 by taking as , and the signaltonoise ratio as . In this way the criteria in Theorems 2.3 and 2.3 become that if
then is an outlier with high probability and the left singular vector has a large component in the direction. Otherwise if
sticks to the bulk and the projection of on is small. ∎
We remark that as indicated by Theorem 3.2, the tensor unfolding algorithms which unfold to an matrix for any choice of and index set , essentially share the same threshold, i.e. , which matches the conjectured threshold. As in (3.4), above the signaltonoise ratio threshold, the left singular vector corresponding to the largest singular value of is aligned with . The leading order accuracy for the estimator as in (3.4) is the same and is independent of . However, if , this does not give us information of individual signal . A further recursive unfolding method is proposed in [51] to recover each individual signal .
Since by taking does change the algorithmic threshold, but increasing the computation cost. We propose the following simple algorithm to recover each signal , by performing the tensor unfolding algorithm for each with , namely .
For each , we unfold (3.1) to an matrix
(3.6) 
which is (3.3) by taking and . In this way (3.6) is a rank one perturbation of a long random matrix in the form of (2.2), and the ratio grows with . We take the largest singular value of , and denote
(3.7) 
as the estimator for ; and the left singular vector corresponding to the largest singular value of , as the estimator for . This gives the following simple algorithm to recover and .
In tensor literature, the above algorithm is exactly the truncated higher order singular value decomposition (HOSVD) introduced in [20]. The higher order orthogonal iteration (HOOI), which uses the truncated HOSVD as initialization combining with a power iteration, was developed in [21] to find a best lowmultilinearrank approximation of a tensor. The performance of HOOI was analyzed in [57] for the spiked tensor model. It was proven that for the signaltonoise ratio with for some large constant , HOOI converges within a logarithm factor of iterations. As an easy consequence of Theorem 3.2, we have the following theorem which gives the exact threshold of the signaltonoise ratio, i.e. . Above the threshold, our estimators and approximate the signaltonoise ratio and the signal vector .
Theorem 3.3.
We assume Assumption 3.1. For any , we unfold to as in (3.6). Let the estimator be as defined in (3.7), and the left singular vector corresponding to the largest singular value of .
Let with , and fix arbitrarily small . For arbitrarily small , if , with high probability, and approximates and
(3.8) 
and
(3.9) 
If , with high probability, the projection of on vanishes as decreases
(3.10) 
provided is large enough.
Remark 3.4.
Given the unfolded matrix , the largest singular value and its left eigenvector can be computed by first computing , then computing the the largest eigenvalue and corresponding eigenvector by power iteration. The total time complexity is , where is the number of iterations, which can be taken as . Therefore, the estimators and can be computed with time complexity . To recover the signals for each , we need to repeat the above tensor unfolding algorithm times, and obtain for each . The total time complexity is .
Proof.
The claims (3.9) and (3.10) follow directly from (3.4) and (3.5) by taking and . In the following we prove (3.8). Fix arbitrarily small . For , with high probability, the largest singular value of is given by (2.5)
(3.11) 
where . Our as defined in (3.7) is chosen as the solution of
(3.12) 
By taking difference of (3.11) and (3.12), and rearranging, we get
(3.13) 
We notice that . Thus (3.13) implies
with high probability, provided is large enough. This finishes the proof (3.8). ∎
We numerically verify Theorem 3.2. We take , and for . We sample the signals as unit Gaussian vectors, and the noise tensor with independent Gaussian entries. In the left panel of Figure 3, we plot the largest singular value of the unfolded matrix and our theoretical prediction (2.5). In the right panel of Figure 3 we plot and our estimator as in (3.7). The estimator provides a good approximation of provided that . In Figure (4) we plot , where the estimator is given as the left singular vector corresponding to the largest singular value of the unfolded matrix. Our theoretical prediction (blue curve) as in (3.9) matches well with the the simulation for . For , our estimator behaves as poorly as random guess, i.e. taking as a random Gaussian vector (Green curve). For in a small neighborhood of , we don’t have a good estimation of , but only an upper bound (3.10). In the second panel of Figure 4, we zoom in around , the red curve, corresponding to the bound (3.10), provides a good upper bound of .
4 Low rank Perturbations of Long Random Matrices
In this section we prove our main results as stated in Section 2. The proof of Theorem 2.2 is given in Section 4.1. We also collect the isotropic local law for long random matrices from [1]. In section 4.2, we derive a master equation which characterizes the outliers of the perturbed matrix . The master equation has been used intensively to study the low rank perturbation of random matrices, for both singular values and eigenvalues, see [10, 11, 9, 30]. The proofs of Theorems 2.3 and 2.4 are given in Sections 4.3 and 4.4 respectively.
4.1 Long Random Matrices
Let be an random matrix, with entries satisfying Assumption 2.1. Let , with . In this section we recall some results of the sample covariance matrices in this setting from [1]. It turns out in this setting the correct normalization is to study , which corresponds to the standard sample covariance matrices, with variance .
We denote the following dependent MarchenkoPastur law corresponding to the ratio ,
(4.1) 
and its quanties as
(4.2) 
The normalization in (4.1) is different from that in (2.1), which corresponds to the sample covariance matrix . We remark that both the MarchenkoPastur law
and its quantiles
depend on through . We recall the following eigenvalue rigidity result from [1, Theorem 2.10].
Comments
There are no comments yet.