Spectral Method for Phase Retrieval: an Expectation Propagation Perspective

by   Junjie Ma, et al.

Phase retrieval refers to the problem of recovering a signal x_∈C^n from its phaseless measurements y_i=|a_i^Hx_|, where {a_i}_i=1^m are the measurement vectors. Many popular phase retrieval algorithms are based on the following two-step procedure: (i) initialize the algorithm based on a spectral method, (ii) refine the initial estimate by a local search algorithm (e.g., gradient descent). The quality of the spectral initialization step can have a major impact on the performance of the overall algorithm. In this paper, we focus on the model where the measurement matrix A=[a_1,...,a_m]^H has orthonormal columns, and study the spectral initialization under the asymptotic setting m,n→∞ with m/n→δ∈(1,∞). We use the expectation propagation framework to characterize the performance of spectral initialization for Haar distributed matrices. Our numerical results confirm that the predictions of the EP method are accurate for not-only Haar distributed matrices, but also for realistic Fourier based models (e.g. the coded diffraction model). The main findings of this paper are the following: (1) There exists a threshold on δ (denoted as δ_weak) below which the spectral method cannot produce a meaningful estimate. We show that δ_weak=2 for the column-orthonormal model. In contrast, previous results by Mondelli and Montanari show that δ_weak=1 for the i.i.d. Gaussian model. (2) The optimal design for the spectral method coincides with that for the i.i.d. Gaussian model, where the latter was recently introduced by Luo, Alghamdi and Lu.



There are no comments yet.



Rigorous Analysis of Spectral Methods for Random Orthogonal Matrices

Phase retrieval refers to algorithmic methods for recovering a signal fr...

Solving phase retrieval with random initial guess is nearly as good as by spectral initialization

The problem of recovering a signal 𝐱∈ℝ^n from a set of magnitude measure...

Linear Spectral Estimators and an Application to Phase Retrieval

Phase retrieval refers to the problem of recovering real- or complex-val...

Convolutional Phase Retrieval via Gradient Descent

We study the convolutional phase retrieval problem, which considers reco...

Phase Transitions of Spectral Initialization for High-Dimensional Nonconvex Estimation

We study a spectral initialization method that serves a key role in rece...

Optimal Spectral Initialization for Signal Recovery with Applications to Phase Retrieval

We present the optimal design of a spectral method widely used to initia...

Randomly Initialized Alternating Least Squares: Fast Convergence for Matrix Sensing

We consider the problem of reconstructing rank-one matrices from random ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In many scientific and engineering applications, it is expensive or even impossible to measure the phase of a signal due to physical limitations [1]. Phase retrieval refers to algorithmic methods for reconstructing signals from magnitude-only measurements. The measuring process for the phase retrieval problem can be modeled as


where () is the measurement matrix, is the signal to be recovered, and denotes the th entry of the vector . Phase retrieval has important applications in areas ranging from X-ray crystallography, astronomical imaging, and many others [1].

Phase retrieval has attracted a lot of research interests since the work of Candes et al [2], where it is proved that a convex optimization based algorithm can provably recover the signal under certain randomness assumptions on . However, the high computational complexity of the PhaseLift algorithm in [2] prohibits its practical applications. More recently, a lot of algorithms were proposed as low-cost iterative solvers for the following nonconvex optimization problem (or its variants) [3, 4, 5, 6]:


These algorithms typically initialize their estimates using a spectral method [7] and then refine the estimates via alternative minimization [7], gradient descend (and variants) [3, 4, 5, 6] or approximate message passing [8]. Since the problem in (2) is nonconvex, the initialization step plays a crucial role for many of these algorithms.

Spectral methods are widely used for initializing local search algorithms in many signal processing applications. In the context of phase retrieval, spectral initialization was first proposed in [7] and later studied in [4, 5, 9, 10, 11]

. To be specific, a spectral initial estimate is given by the leading eigenvector (after proper scaling) of the following data matrix



where denotes a diagonal matrix formed by and

is a nonlinear processing function. A natural measure of the quality of the spectral initialization is the cosine similarity

[9] between the estimate and the true signal vector:


where denotes the spectral estimate. The performance of the spectral initialization highly depends on the processing function . Popular choices of include the “trimming” function proposed in [4] (the name follows [9]), the “subset method” in [5], proposed by Mondelli and Montanari in [10], and recently proposed in [11]:


In the above expressions, and are tunable thresholds, and denotes an indicator function that equals one when the condition is satisfied and zero otherwise.

The asymptotic performance of the spectral method was studied in [9] under the assumption that contains i.i.d. Gaussian entries. The results of [9]

unveil a phase transition phenomenon in the regime where

with fixed. Specifically, there exists a threshold on the measurement ratio : below this threshold the cosine similarity converges to zero (meaning that is not a meaningful estimate) and above it is strictly positive. Later, Mondelli and Montanari showed in [12] that defined in (5) minimizes the above-mentioned reconstruction threshold. Following [12], we will call the minimum threshold (over all possible ) the weak threshold. It is proved in [12] that the weak thresholds are and , respectively, for real and complex valued models. Further, [12] showed that these thresholds are also information-theoretically optimal for weak recovery. Notice that minimizes the reconstruction threshold, but does not necessarily maximize when is larger than the weak threshold. The latter criterion is more relevant in practice, since intuitively speaking a larger implies better initialization, and hence the overall algorithm is more likely to succeed. This problem was recently studied by Luo, Alghamdi and Lu in [11], where it was shown that the function defined in (5) uniformly maximizes for an arbitrary (and hence also achieves the weak threshold).

Notice that the analyses of the spectral method in [9, 12, 11] are based on the assumption that

has i.i.d. Gaussian elements. This assumption is key to the random matrix theory (RMT) tools used in

[9]. However, the measurement matrix for many (if not all) important phase retrieval applications is a Fourier matrix[13]. In certain applications, it is possible to randomize the measuring mechanism by adding random masks [14]. Such models are usually referred to as coded diffraction patterns (CDP) models [14]. In the CDP model, can be expressed as



is a square discrete Fourier transform (DFT) matrix, and

represents the effect of random masks. For the CDP model, has orthonormal columns, namely,


In this paper, we assume

to be an isotropically random unitary matrix (or simply Haar distributed). We study the performance of the spectral method and derive a formula to predict the cosine similarity

. We conjecture that our prediction is asymptotically exact as with fixed. Based on this conjecture, we are able to show the following.

  • There exists a threshold on (denoted as ) below which the spectral method cannot produce a meaningful estimate. We show that for the column-orthonormal model. In contrast, previous results by Mondelli and Montanari show that the i.i.d. Gaussian model.

  • The optimal design for the spectral method coincides with that for the i.i.d. Gaussian model, where the latter was recently derived by Luo, Alghamdi and Lu.

Our asymptotic predictions are derived by analyzing an expectation propagation (EP) [15] type message passing algorithm [16, 17] that aims to find the leading eigenvector. A key tool in our analysis is a state evolution (SE) technique that has been extensively studied for the compressed sensing problem [17, 18, 19, 20, 21, 22]

. Several arguments about the connection between the message passing algorithm and the spectral estimator is heuristic, and thus the results in this paper are not rigorous yet. Nevertheless, numerical results suggest that our analysis accurately predicts the actual performance of the spectral initialization under the practical CDP model. This is perhaps a surprising phenomenon, considering the fact that the sensing matrix in (

6) is still quite structured although the matrices introduce certain randomness.

We believe that our prediction could be made rigorous by using RMT tools [9]. It is also expected that the same results can be obtained by using the replica method [23, 24, 25], which is a non-rigorous, but powerful tool from statistical physics. Compared with the replica method, our method seems to be technically simpler and more flexible (e.g., we might be able to handle the case where the signal is known to be positive).

Notations: denotes the conjugate of a complex number . We use bold lower-case and upper case letters for vectors and matrices respectively. For a matrix , and denote the transpose of a matrix and its Hermitian respectively. denotes a diagonal matrix with being the diagonal entries. Circularly-symmetric Gaussian , where is the mean vector and the covariance matrix. For , define . For a Hermitian matrix , and

denote the largest and smallest eigenvalues of

respectively. For a non-Hermitian matrix , denotes the eigenvalue of that has the largest magnitude. denotes the trace of a matrix.

denotes convergence in probability.

Ii Asymptotic Analysis of the Spectral Method

This sections presents the main results of this paper. Our results have not been fully proved, we call them “claims” throughout this section to avoid confusion. The rationales for our claims will be discussed in Section IV.

Ii-a Assumptions

In this paper, we make the following assumption on the sensing matrix .

Assumption 1.

The sensing matrix () is sampled uniformly at random from column orthogonal matrices satisfying .

Assumption 1 introduces certain randomness assumption about the measurement matrix . However, in practice, the measurement matrix is usually a Fourier matrix or some variants of it. One particular example is the coded diffraction patterns (CDP) model in (6). For this model, the only freedom one has is the “random masks” , and the overall matrix is still quite structured. In this regard, it is perhaps quite surprising that the predictions developed under Assumption 1 are very accurate even for the CDP model. Please refer to Section V for our numerical results.

We further make the following assumption about the processing function .

Assumption 2.

The processing function satisfies , where .

As pointed out in [9], the boundedness of in Assumption 2-(I) is the key to the success of the truncated spectral initializer proposed in [4]. (Following [9], we will call it the trimming method in this paper.) Notice that we assume without any loss of generality. To see this, consider the following modification of :

where . It is easy to see that

where we used . Clearly, the top eigenvector of is the same as that of , where for the latter matrix we have .

Ii-B Asymptotic analysis

Let be the principal eigenvector of the following data matrix:


where Following [9], we use the squared cosine similarity defined below to measure the accuracy of :


Our goal is to understand how behaves when with a fixed ratio . It seems possible to solve this problem by adapting the tools developed in [9]. Yet, we take a different approach which we believe to be technically simpler and more flexible. Specifically, we will derive a message passing algorithm that can converge to the spectral estimator in a certain limit. Central to our analysis is a deterministic recursion, called state evolution (SE), that characterizes the asymptotic behavior of the message passing algorithm. By analyzing the stationary points of the SE, we obtain certain predictions about the spectral estimator. This approach has been adopted in [26] to analyze the asymptotic performance of the LASSO estimator and in [27] for the nonnegative PCA estimator, based on the approximate message passing (AMP) algorithm[28, 29]

. However, the SE analysis of AMP does not apply to the partial orthogonal matrix model considered in this paper.

Different from [27], our analysis is based on a variant of the expectation propagation (EP) algorithm [15, 16], called PCA-EP in this paper. Different from AMP, such EP-style algorithms could be analyzed via SE for a wider class of measurement matrices, including the Haar model considered here[17, 19, 18, 30, 20, 21, 31, 22]. The derivations of PCA-EP and its SE analysis will be introduced in Section IV.

Our characterization of involves the following functions (for ):


where and is a shorthand for


We note that , and all depend on the processing function . However, to simplify notation, we will not make this dependency explicit. Claim 1 below summarizes our asymptotic characterization of the cosine similarity .

Claim 1 (Cosine similarity).





where . Let


Then, as with a fixed ratio , we have


where with a slight abuse of notations,


and is a solution to


Claim 1, which is reminiscent of Theorem 1 in [9], reveals a phase transition behavior of : is strictly positive when and is zero otherwise. In the former case, the spectral estimator is positively correlated with the true signal and hence provides useful information about ; whereas in the latter case , meaning that the spectral estimator is asymptotically orthogonal to the true signal and hence performs no better than a random guess. As discussed before, Claim 1 is derived from the analysis of an EP algorithm, and is not proved rigorously yet. The rationales behind this claim will be discussed in Section IV.

Remark 1.

For notational simplicity, we will write and as and , respectively.

Some remarks about the phase transition condition are given in Lemma 1 below. Item (i) guarantees the uniqueness of the solution to (17). Item (ii) is the actual intuition that leads to the conjectured phase transition condition. The proof of Lemma 1 can be found in Appendix A.

Lemma 1.

(i) Eq. (17) has a unique solution if , where is defined in (14). (ii) if and only if there exists a such that

The latter statement is equivalent to

Fig. 1: Examples of , and . Left: with . Middle: with . In both cases, we further scaled so that . Right: .

Fig. 1 plots , and for the trimming function defined in (5a), and the function defined in (5d). Different truncation threshold are considered in the first two figures. For the first and third figures, and have a unique nonzero crossing point in , which we denote as . Further, for , and for (for the first figure). Then by Lemma 1-(ii), a phase transition happens when , or equivalently

The spectral estimator is positively correlated with the signal when , and perpendicular to it otherwise. Now consider the figure in the middle panel. Note that only depends on through , and it is straightforward to check that , and do not depend on for such . Hence, there is no solution to for any . According to Lemma 1 and Claim 1, we have . The situation for is shown in the figure on the right panel Fig. 1. It is straightforward to see that in this case.

Iii Optimal Design of

Claim 1 characterizes the condition under which the leading eigenvector is positively correlated with the signal vector for a given . We would like to understand how does affect the performance of the spectral estimator. We first introduce the following definitions:




Following [12], the latter threshold is referred to as the weak threshold. We will answer the following questions:

  • What is for the spectral method under a partial orthogonal model?

  • For a given , what is the optimal that maximizes ?

These questions have been addressed for an i.i.d. Gaussian measurement model. In [12], Mondelli and Montanari proved that for this model. They also proposed a function (i.e., in (5)) that achieves the weak threshold. Very recently, [11] proved that given in (5) maximizes for any fixed , and is therefore uniformly optimal. For the partial orthogonal measurement model, the above problems could also be addressed based on our asymptotic characterization of the spectral initialization. It it perhaps surprising that the same function is also uniformly optimal under the partial orthogonal sensing model considered in this paper. Note that although is optimal for both the i.i.d. Gaussian and the partial orthogonal models, the performances of the spectral method are different: for the former model , whereas for the latter model . Theorem 1 below, whose proof can be found in Appendix B, summarizes our findings.

Theorem 1 (Optimality of ).

Suppose that Claim 1 is correct, then for the partial orthogonal measurement model. Further, for any , in (5) maximizes and is given by

where is the unique solution to (with ). Finally, is an increasing function of .

Remark 2.

The function that can achieve the weak threshold is not unique. For instance, the following function also achieves the weak threshold:

It is straightforward to show that for any . Further, some straightforward calculations show that

Hence, by Lemma 1 and Claim 1, the cosine similarity is strictly positive for any .

Iv An EP-based Algorithm for the Spectral Method

In this section, we present the rationale behind Claim 1. Our reasoning is based on analyzing an expectation propagation (EP) [15, 16, 30, 22] based message passing algorithm that aims to find the leading eigenvector of . Since our algorithm intends to solve the eigenvector problem, we will call it PCA-EP throughout this paper. The key to our analysis is a state evolution (SE) tool for such EP-based algorithm, first conjectured in [17, 19] (for a partial DFT sensing matrix) and [18] (for generic unitarily-invariant sensing matrices), and later proved in [20, 21] (for general unitarily-invariant matrices). For approximate message passing (AMP) algorithms [28], state evolution (SE) has proved to be a powerful tool for performance analysis in the high dimensional limit [26, 27, 32, 33].

Iv-a The Pca-Ep algorithm

The leading eigenvector of is the solution to the following optimization problem:


where . The normalization (instead of the commonly-used constraint ) is introduced for convenience. PCA-EP is an EP-type message passing algorithm that aims to solve (20). Starting from an initial estimate , PCA-EP proceeds as follows (for ):

where is a diagonal matrix defined as
In (21b), is a parameter that can be tuned. At every iteration of PCA-EP , the estimate for the leading eigenvector is given by

The derivations of PCA-EP can be found in Appendix C. Before we proceed, we would like to mention a couple of points:

  • The PCA-EP algorithm is a tool for performance analysis, not an actual numerical method for finding the leading eigenvector. Further, our analysis is purely based on the iteration defined in (21), and the heuristic behind (21) is irrelevant for our analysis.

  • The PCA-EP iteration has a parameter: , which does not change across iterations. To calibrate PCA-EP with the eigenvector problem, we need to choose the value of carefully. We will discuss the details in Section IV.

  • From this representation, (21a) can be viewed as a power method applied to . This observation is crucial for our analysis. We used the notation to emphasize the impact of .

  • The two matrices involved in satisfy the following “zero-trace” property (referred to as “divergence-free” in [18]):


    This zero-trace property is the key to the correctness of the state evolution (SE) characterization.

Iv-B State evolution analysis

State evolution (SE) was first introduced in [28, 29] to analyze the dynamic behavior of AMP. However, the original SE technique for AMP only works when the sensing matrix has i.i.d. entries. Similar to AMP, PCA-EP can also be described by certain SE recursions, but the SE for PCA-EP works for a wider class of sensing matrices (specifically, unitarily-invariant [18, 20, 21]) that include the random partial orthogonal matrix considered in this paper.

Denote . Assume that the initial estimate , where is independent of . Then, intuitively speaking, PCA-EP has an appealing property that in (21a) is approximately


where and are the variables defined in (24), and is an iid Gaussian vector. Due to this property, the distribution of is fully specified by and . Further, for a given and a fixed value of , the sequences and can be predicted via the following two-dimensional map:


where the functions , and are defined in (10). To gain some intuition on the SE, we present a heuristic way of deriving the SE in Appendix D. We believe that it is possible to make the SE prediction rigorous using the conditioning technique developed in [20, 21]. However, since the link between the PCA-EP and the spectral estimator is already non-rigorous, we did not make such effort. A precise statement about the SE predictions is summarized by Claim 2 below.

Claim 2 (SE prediction).

Consider the PCA-EP algorithm in (21). Assume that the initial estimate where is independent of . Then, almost surely, the following hold for :

where and are defined in (24). Furthermore, almost surely we have


where is defined in (21d).

Iv-C Connection between Pca-Ep and the PCA problem

Lemma 2 below shows that any nonzero stationary point of PCA-EP in (21) is an eigenvector of the matrix . This is the our motivation for analyzing the performance of the spectral method through PCA-EP.

Lemma 2.

Consider the PCA-EP algorithm in (21) with . Let be an arbitrary stationary point of PCA-EP . Suppose that , then is an eigen-pair of , where the eigenvalue is given by


We introduce the following auxiliary variable:


When is a stationary point of (21a), we have


Rearranging terms, we have

Multiplying from both sides of the above equation yields

After simple calculations, we finally obtain


In the above, we have identified an eigen-pair for the matrix . To complete our proof, we note from (21a) and (27) that and so



is also an eigenvector. ∎

Remark 3.

Notice that the eigenvalue identified in (26) is closely related to the function in (13). In fact, the only difference between (26) and (13) is that the normalized trace in (26) is replaced by , where . Under certain mild regularity conditions on

, it is straightforward to use the weak law of large numbers to prove that


Lemma 2 shows that the stationary points of the PCA-EP algorithm are eigenvectors of . Since the asymptotic performance of PCA-EP can be characterized via the SE platform, it is conceivable that the fixed points of the SE describe the asymptotic performance of the spectral estimator. However, the answers to the following questions are still unclear:

  • Even though is an eigenvector of , does it correspond to the largest eigenvalue?

  • The eigenvalue in (26) depends on , which looks like a free parameter. How should be chosen?

In the following section, we will discuss these issues and provide some heuristic arguments for Claim 1.

Iv-D Heuristics about Claim 1

The SE equations in (24) have two sets of solutions (in terms of , and ):

Uninformative solution: (31a)
Informative solution: (31c)
Remark 4.

Both solutions in (31) do not have a constraint on the norm of the estimate (which corresponds to a constraint on ). This is because we ignored the norm constraint in deriving our PCA-EP algorithm in Appendix C-A. If our derivations explicitly take into account the norm constraint, we would get an additional equation that can specify the individual values of and . However, only the ratio matters for our phase transition analysis. Hence, we have ignored the constraint on the norm of the estimate.

Remark 5.

is also a valid fixed point to (24). However, this solution corresponds to the all-zero vector and is not of interest for our purpose.

For the uninformative solution, we have and hence according to (25). On the other hand, the informative solution (if exists) corresponds to an estimate that is positively correlated with the signal, namely, . Recall from (21) that is a parameter of the PCA-EP algorithm. Claim 1 is obtained based on the following heuristic argument: if and only if there exists a so that the informative solution is valid. More specifically, there exists a so that the following conditions hold:


Further, we have proved that (see Lemma 1) the above condition is equivalent to the phase transition presented in Claim 1. Note that (and so ) is an implicit condition in our derivations of the SE. Hence, for a valid informative solution, we must have .

We now provide some heuristic rationales about Claim 1. Our first observation is that if the informative solution exists and we set in PCA-EP properly, then and neither tend to zero or infinity. A precise statement is given in Lemma 3 below. The proof is straightforward and hence omitted.

Lemma 3.

Suppose that , where is defined in (14). Consider the PCA-EP algorithm with set to , where is the unique solution to

Let and . Then, and generated by (24) converge and

From Claim 2, we have almost surely (see Lemma 3)


Namely, the norm does not vanish or explode, under a particular limit order. Now, suppose that for a finite and deterministic problem instance we still have


for some constant . Recall that PCA-EP can be viewed as a power iteration applied to the matrix (see (21)). Hence, (34) implies

where denotes the spectral radius of . Based on these heuristic arguments, we conjecture that almost surely. Further, we show in Appendices D-C and D-D that PCA-EP algorithm converges in this case. Clearly, this implies that is an eigen-pair of , where is the limit of the estimate . Combining the above arguments, we have the following conjecture


If (35) is indeed correct, then the following lemma shows that the largest eigenvalue of is . A proof of the lemma can be found in Appendix E-A.

Lemma 4.

Consider the matrix defined in (21a), where . Let be the eigenvalue of that has the largest magnitude and is the corresponding eigenvector. If and , then


Recall from Lemma 2 that if PCA-EP converges, then the stationary estimate is an eigenvector of , but it is unclear whether it is the leading eigenvector. Our heuristic arguments in this section and Lemma 4 imply that PCA-EP indeed converges to the leading eigenvector of . Therefore, we conjecture that the fixed points of the SE characterize the asymptotic behavior of the spectral estimator.

Fig. 2: Left: Histogram of the eigenvalues of . Right: Scatter plot of the eigenvalues of . is set to the unique solution to . . . . The signal

is randomly generated from an i.i.d. zero Gaussian distribution.

Fig. 3: Scatter plot of the eigenvalues of in the complex plane. are sampled from an i.i.d. Gaussian distribution. , . Left: , Middle: , Right: with .

The above argument is not rigorous. However, our numerical results suggest that the conclusions are correct. An example is shown in Fig. 2. Here, the figure on the right plots the eigenvalues of a random realization of with . We can see that an outlying eigenvalue pops out of the bulk part of the spectrum. Also, the outlying eigenvalue is close to one. Fig. 3 further plots the eigenvalues of for three other choices of . We see that all of the results seem to support our conjecture: there is an outlying eigenvalue (at one on the real line), although the shape of the bulk parts depends on the specific choice of .

Iv-E On the domain of , , , and

In the previous discussions, we assumed that , , and are all defined on . This assumption was implicitly used to derive the PCA-EP algorithm. Specifically, the Gaussian pdf for obtaining (78) in the derivation of PCA-EP is not well defined if . Nevertheless, the final PCA-EP algorithm in (21) is well defined, as long as is well defined. We have assumed . Let us further assume that is bounded from below:

where . Under this assumption, is well defined as long as . On the other hand, we only focused on the domain (or ) in our previous discussions. In particular, we conjectured that if and only if there exists an informative solution (see (31)) for . A natural question is what if the SE equations in (31) do not have informative solutions for , but do have such a solution for ? To be specific, suppose that satisfies the following conditions:

Then, one might ask what happens if we consider a PCA-EP algorithm by setting to such a ? It turns out that, based on arguments similar to those presented in Section IV-D, we can obtain


Namely, our method can provide a conjecture about the minimum eigenvalue of . To see this, we note that using exactly the same arguments as those in Section IV-D, we claim (heuristically) that


Lemma 5 below further establishes a connection between the extremal eigenvalues of and . Its proof is postponed to Appendix E-B.

Lemma 5.

Consider the matrix