In many scientific and engineering applications, it is expensive or even impossible to measure the phase of a signal due to physical limitations . Phase retrieval refers to algorithmic methods for reconstructing signals from magnitude-only measurements. The measuring process for the phase retrieval problem can be modeled as
where () is the measurement matrix, is the signal to be recovered, and denotes the th entry of the vector . Phase retrieval has important applications in areas ranging from X-ray crystallography, astronomical imaging, and many others .
Phase retrieval has attracted a lot of research interests since the work of Candes et al , where it is proved that a convex optimization based algorithm can provably recover the signal under certain randomness assumptions on . However, the high computational complexity of the PhaseLift algorithm in  prohibits its practical applications. More recently, a lot of algorithms were proposed as low-cost iterative solvers for the following nonconvex optimization problem (or its variants) [3, 4, 5, 6]:
These algorithms typically initialize their estimates using a spectral method  and then refine the estimates via alternative minimization , gradient descend (and variants) [3, 4, 5, 6] or approximate message passing . Since the problem in (2) is nonconvex, the initialization step plays a crucial role for many of these algorithms.
Spectral methods are widely used for initializing local search algorithms in many signal processing applications. In the context of phase retrieval, spectral initialization was first proposed in  and later studied in [4, 5, 9, 10, 11]
. To be specific, a spectral initial estimate is given by the leading eigenvector (after proper scaling) of the following data matrix:
where denotes a diagonal matrix formed by and
is a nonlinear processing function. A natural measure of the quality of the spectral initialization is the cosine similarity between the estimate and the true signal vector:
where denotes the spectral estimate. The performance of the spectral initialization highly depends on the processing function . Popular choices of include the “trimming” function proposed in  (the name follows ), the “subset method” in , proposed by Mondelli and Montanari in , and recently proposed in :
In the above expressions, and are tunable thresholds, and denotes an indicator function that equals one when the condition is satisfied and zero otherwise.
unveil a phase transition phenomenon in the regime wherewith fixed. Specifically, there exists a threshold on the measurement ratio : below this threshold the cosine similarity converges to zero (meaning that is not a meaningful estimate) and above it is strictly positive. Later, Mondelli and Montanari showed in  that defined in (5) minimizes the above-mentioned reconstruction threshold. Following , we will call the minimum threshold (over all possible ) the weak threshold. It is proved in  that the weak thresholds are and , respectively, for real and complex valued models. Further,  showed that these thresholds are also information-theoretically optimal for weak recovery. Notice that minimizes the reconstruction threshold, but does not necessarily maximize when is larger than the weak threshold. The latter criterion is more relevant in practice, since intuitively speaking a larger implies better initialization, and hence the overall algorithm is more likely to succeed. This problem was recently studied by Luo, Alghamdi and Lu in , where it was shown that the function defined in (5) uniformly maximizes for an arbitrary (and hence also achieves the weak threshold).
has i.i.d. Gaussian elements. This assumption is key to the random matrix theory (RMT) tools used in. However, the measurement matrix for many (if not all) important phase retrieval applications is a Fourier matrix. In certain applications, it is possible to randomize the measuring mechanism by adding random masks . Such models are usually referred to as coded diffraction patterns (CDP) models . In the CDP model, can be expressed as
is a square discrete Fourier transform (DFT) matrix, andrepresents the effect of random masks. For the CDP model, has orthonormal columns, namely,
In this paper, we assume
to be an isotropically random unitary matrix (or simply Haar distributed). We study the performance of the spectral method and derive a formula to predict the cosine similarity. We conjecture that our prediction is asymptotically exact as with fixed. Based on this conjecture, we are able to show the following.
There exists a threshold on (denoted as ) below which the spectral method cannot produce a meaningful estimate. We show that for the column-orthonormal model. In contrast, previous results by Mondelli and Montanari show that the i.i.d. Gaussian model.
The optimal design for the spectral method coincides with that for the i.i.d. Gaussian model, where the latter was recently derived by Luo, Alghamdi and Lu.
Our asymptotic predictions are derived by analyzing an expectation propagation (EP)  type message passing algorithm [16, 17] that aims to find the leading eigenvector. A key tool in our analysis is a state evolution (SE) technique that has been extensively studied for the compressed sensing problem [17, 18, 19, 20, 21, 22]
. Several arguments about the connection between the message passing algorithm and the spectral estimator is heuristic, and thus the results in this paper are not rigorous yet. Nevertheless, numerical results suggest that our analysis accurately predicts the actual performance of the spectral initialization under the practical CDP model. This is perhaps a surprising phenomenon, considering the fact that the sensing matrix in (6) is still quite structured although the matrices introduce certain randomness.
We believe that our prediction could be made rigorous by using RMT tools . It is also expected that the same results can be obtained by using the replica method [23, 24, 25], which is a non-rigorous, but powerful tool from statistical physics. Compared with the replica method, our method seems to be technically simpler and more flexible (e.g., we might be able to handle the case where the signal is known to be positive).
Notations: denotes the conjugate of a complex number . We use bold lower-case and upper case letters for vectors and matrices respectively. For a matrix , and denote the transpose of a matrix and its Hermitian respectively. denotes a diagonal matrix with being the diagonal entries. Circularly-symmetric Gaussian , where is the mean vector and the covariance matrix. For , define . For a Hermitian matrix , and
denote the largest and smallest eigenvalues ofrespectively. For a non-Hermitian matrix , denotes the eigenvalue of that has the largest magnitude. denotes the trace of a matrix.
denotes convergence in probability.
Ii Asymptotic Analysis of the Spectral Method
This sections presents the main results of this paper. Our results have not been fully proved, we call them “claims” throughout this section to avoid confusion. The rationales for our claims will be discussed in Section IV.
In this paper, we make the following assumption on the sensing matrix .
The sensing matrix () is sampled uniformly at random from column orthogonal matrices satisfying .
Assumption 1 introduces certain randomness assumption about the measurement matrix . However, in practice, the measurement matrix is usually a Fourier matrix or some variants of it. One particular example is the coded diffraction patterns (CDP) model in (6). For this model, the only freedom one has is the “random masks” , and the overall matrix is still quite structured. In this regard, it is perhaps quite surprising that the predictions developed under Assumption 1 are very accurate even for the CDP model. Please refer to Section V for our numerical results.
We further make the following assumption about the processing function .
The processing function satisfies , where .
As pointed out in , the boundedness of in Assumption 2-(I) is the key to the success of the truncated spectral initializer proposed in . (Following , we will call it the trimming method in this paper.) Notice that we assume without any loss of generality. To see this, consider the following modification of :
where . It is easy to see that
where we used . Clearly, the top eigenvector of is the same as that of , where for the latter matrix we have .
Ii-B Asymptotic analysis
Let be the principal eigenvector of the following data matrix:
where Following , we use the squared cosine similarity defined below to measure the accuracy of :
Our goal is to understand how behaves when with a fixed ratio . It seems possible to solve this problem by adapting the tools developed in . Yet, we take a different approach which we believe to be technically simpler and more flexible. Specifically, we will derive a message passing algorithm that can converge to the spectral estimator in a certain limit. Central to our analysis is a deterministic recursion, called state evolution (SE), that characterizes the asymptotic behavior of the message passing algorithm. By analyzing the stationary points of the SE, we obtain certain predictions about the spectral estimator. This approach has been adopted in  to analyze the asymptotic performance of the LASSO estimator and in  for the nonnegative PCA estimator, based on the approximate message passing (AMP) algorithm[28, 29]
. However, the SE analysis of AMP does not apply to the partial orthogonal matrix model considered in this paper.
Different from , our analysis is based on a variant of the expectation propagation (EP) algorithm [15, 16], called PCA-EP in this paper. Different from AMP, such EP-style algorithms could be analyzed via SE for a wider class of measurement matrices, including the Haar model considered here[17, 19, 18, 30, 20, 21, 31, 22]. The derivations of PCA-EP and its SE analysis will be introduced in Section IV.
Our characterization of involves the following functions (for ):
where and is a shorthand for
We note that , and all depend on the processing function . However, to simplify notation, we will not make this dependency explicit. Claim 1 below summarizes our asymptotic characterization of the cosine similarity .
Claim 1 (Cosine similarity).
where . Let
Then, as with a fixed ratio , we have
where with a slight abuse of notations,
and is a solution to
Claim 1, which is reminiscent of Theorem 1 in , reveals a phase transition behavior of : is strictly positive when and is zero otherwise. In the former case, the spectral estimator is positively correlated with the true signal and hence provides useful information about ; whereas in the latter case , meaning that the spectral estimator is asymptotically orthogonal to the true signal and hence performs no better than a random guess. As discussed before, Claim 1 is derived from the analysis of an EP algorithm, and is not proved rigorously yet. The rationales behind this claim will be discussed in Section IV.
For notational simplicity, we will write and as and , respectively.
Some remarks about the phase transition condition are given in Lemma 1 below. Item (i) guarantees the uniqueness of the solution to (17). Item (ii) is the actual intuition that leads to the conjectured phase transition condition. The proof of Lemma 1 can be found in Appendix A.
Fig. 1 plots , and for the trimming function defined in (5a), and the function defined in (5d). Different truncation threshold are considered in the first two figures. For the first and third figures, and have a unique nonzero crossing point in , which we denote as . Further, for , and for (for the first figure). Then by Lemma 1-(ii), a phase transition happens when , or equivalently
The spectral estimator is positively correlated with the signal when , and perpendicular to it otherwise. Now consider the figure in the middle panel. Note that only depends on through , and it is straightforward to check that , and do not depend on for such . Hence, there is no solution to for any . According to Lemma 1 and Claim 1, we have . The situation for is shown in the figure on the right panel Fig. 1. It is straightforward to see that in this case.
Iii Optimal Design of
Claim 1 characterizes the condition under which the leading eigenvector is positively correlated with the signal vector for a given . We would like to understand how does affect the performance of the spectral estimator. We first introduce the following definitions:
Following , the latter threshold is referred to as the weak threshold. We will answer the following questions:
What is for the spectral method under a partial orthogonal model?
For a given , what is the optimal that maximizes ?
These questions have been addressed for an i.i.d. Gaussian measurement model. In , Mondelli and Montanari proved that for this model. They also proposed a function (i.e., in (5)) that achieves the weak threshold. Very recently,  proved that given in (5) maximizes for any fixed , and is therefore uniformly optimal. For the partial orthogonal measurement model, the above problems could also be addressed based on our asymptotic characterization of the spectral initialization. It it perhaps surprising that the same function is also uniformly optimal under the partial orthogonal sensing model considered in this paper. Note that although is optimal for both the i.i.d. Gaussian and the partial orthogonal models, the performances of the spectral method are different: for the former model , whereas for the latter model . Theorem 1 below, whose proof can be found in Appendix B, summarizes our findings.
Theorem 1 (Optimality of ).
Suppose that Claim 1 is correct, then for the partial orthogonal measurement model. Further, for any , in (5) maximizes and is given by
where is the unique solution to (with ). Finally, is an increasing function of .
The function that can achieve the weak threshold is not unique. For instance, the following function also achieves the weak threshold:
It is straightforward to show that for any . Further, some straightforward calculations show that
Iv An EP-based Algorithm for the Spectral Method
In this section, we present the rationale behind Claim 1. Our reasoning is based on analyzing an expectation propagation (EP) [15, 16, 30, 22] based message passing algorithm that aims to find the leading eigenvector of . Since our algorithm intends to solve the eigenvector problem, we will call it PCA-EP throughout this paper. The key to our analysis is a state evolution (SE) tool for such EP-based algorithm, first conjectured in [17, 19] (for a partial DFT sensing matrix) and  (for generic unitarily-invariant sensing matrices), and later proved in [20, 21] (for general unitarily-invariant matrices). For approximate message passing (AMP) algorithms , state evolution (SE) has proved to be a powerful tool for performance analysis in the high dimensional limit [26, 27, 32, 33].
Iv-a The Pca-Ep algorithm
The leading eigenvector of is the solution to the following optimization problem:
where . The normalization (instead of the commonly-used constraint ) is introduced for convenience. PCA-EP is an EP-type message passing algorithm that aims to solve (20). Starting from an initial estimate , PCA-EP proceeds as follows (for ):
|where is a diagonal matrix defined as|
|In (21b), is a parameter that can be tuned. At every iteration of PCA-EP , the estimate for the leading eigenvector is given by|
The derivations of PCA-EP can be found in Appendix C. Before we proceed, we would like to mention a couple of points:
The PCA-EP iteration has a parameter: , which does not change across iterations. To calibrate PCA-EP with the eigenvector problem, we need to choose the value of carefully. We will discuss the details in Section IV.
From this representation, (21a) can be viewed as a power method applied to . This observation is crucial for our analysis. We used the notation to emphasize the impact of .
The two matrices involved in satisfy the following “zero-trace” property (referred to as “divergence-free” in ):
This zero-trace property is the key to the correctness of the state evolution (SE) characterization.
Iv-B State evolution analysis
State evolution (SE) was first introduced in [28, 29] to analyze the dynamic behavior of AMP. However, the original SE technique for AMP only works when the sensing matrix has i.i.d. entries. Similar to AMP, PCA-EP can also be described by certain SE recursions, but the SE for PCA-EP works for a wider class of sensing matrices (specifically, unitarily-invariant [18, 20, 21]) that include the random partial orthogonal matrix considered in this paper.
Denote . Assume that the initial estimate , where is independent of . Then, intuitively speaking, PCA-EP has an appealing property that in (21a) is approximately
where and are the variables defined in (24), and is an iid Gaussian vector. Due to this property, the distribution of is fully specified by and . Further, for a given and a fixed value of , the sequences and can be predicted via the following two-dimensional map:
where the functions , and are defined in (10). To gain some intuition on the SE, we present a heuristic way of deriving the SE in Appendix D. We believe that it is possible to make the SE prediction rigorous using the conditioning technique developed in [20, 21]. However, since the link between the PCA-EP and the spectral estimator is already non-rigorous, we did not make such effort. A precise statement about the SE predictions is summarized by Claim 2 below.
Iv-C Connection between Pca-Ep and the PCA problem
Lemma 2 below shows that any nonzero stationary point of PCA-EP in (21) is an eigenvector of the matrix . This is the our motivation for analyzing the performance of the spectral method through PCA-EP.
Consider the PCA-EP algorithm in (21) with . Let be an arbitrary stationary point of PCA-EP . Suppose that , then is an eigen-pair of , where the eigenvalue is given by
We introduce the following auxiliary variable:
When is a stationary point of (21a), we have
Rearranging terms, we have
Multiplying from both sides of the above equation yields
After simple calculations, we finally obtain
is also an eigenvector. ∎
Notice that the eigenvalue identified in (26) is closely related to the function in (13). In fact, the only difference between (26) and (13) is that the normalized trace in (26) is replaced by , where . Under certain mild regularity conditions on , it is straightforward to use the weak law of large numbers to prove that
, it is straightforward to use the weak law of large numbers to prove that.
Lemma 2 shows that the stationary points of the PCA-EP algorithm are eigenvectors of . Since the asymptotic performance of PCA-EP can be characterized via the SE platform, it is conceivable that the fixed points of the SE describe the asymptotic performance of the spectral estimator. However, the answers to the following questions are still unclear:
Even though is an eigenvector of , does it correspond to the largest eigenvalue?
The eigenvalue in (26) depends on , which looks like a free parameter. How should be chosen?
In the following section, we will discuss these issues and provide some heuristic arguments for Claim 1.
Iv-D Heuristics about Claim 1
The SE equations in (24) have two sets of solutions (in terms of , and ):
Both solutions in (31) do not have a constraint on the norm of the estimate (which corresponds to a constraint on ). This is because we ignored the norm constraint in deriving our PCA-EP algorithm in Appendix C-A. If our derivations explicitly take into account the norm constraint, we would get an additional equation that can specify the individual values of and . However, only the ratio matters for our phase transition analysis. Hence, we have ignored the constraint on the norm of the estimate.
is also a valid fixed point to (24). However, this solution corresponds to the all-zero vector and is not of interest for our purpose.
For the uninformative solution, we have and hence according to (25). On the other hand, the informative solution (if exists) corresponds to an estimate that is positively correlated with the signal, namely, . Recall from (21) that is a parameter of the PCA-EP algorithm. Claim 1 is obtained based on the following heuristic argument: if and only if there exists a so that the informative solution is valid. More specifically, there exists a so that the following conditions hold:
Further, we have proved that (see Lemma 1) the above condition is equivalent to the phase transition presented in Claim 1. Note that (and so ) is an implicit condition in our derivations of the SE. Hence, for a valid informative solution, we must have .
We now provide some heuristic rationales about Claim 1. Our first observation is that if the informative solution exists and we set in PCA-EP properly, then and neither tend to zero or infinity. A precise statement is given in Lemma 3 below. The proof is straightforward and hence omitted.
Namely, the norm does not vanish or explode, under a particular limit order. Now, suppose that for a finite and deterministic problem instance we still have
where denotes the spectral radius of . Based on these heuristic arguments, we conjecture that almost surely. Further, we show in Appendices D-C and D-D that PCA-EP algorithm converges in this case. Clearly, this implies that is an eigen-pair of , where is the limit of the estimate . Combining the above arguments, we have the following conjecture
Consider the matrix defined in (21a), where . Let be the eigenvalue of that has the largest magnitude and is the corresponding eigenvector. If and , then
Recall from Lemma 2 that if PCA-EP converges, then the stationary estimate is an eigenvector of , but it is unclear whether it is the leading eigenvector. Our heuristic arguments in this section and Lemma 4 imply that PCA-EP indeed converges to the leading eigenvector of . Therefore, we conjecture that the fixed points of the SE characterize the asymptotic behavior of the spectral estimator.
The above argument is not rigorous. However, our numerical results suggest that the conclusions are correct. An example is shown in Fig. 2. Here, the figure on the right plots the eigenvalues of a random realization of with . We can see that an outlying eigenvalue pops out of the bulk part of the spectrum. Also, the outlying eigenvalue is close to one. Fig. 3 further plots the eigenvalues of for three other choices of . We see that all of the results seem to support our conjecture: there is an outlying eigenvalue (at one on the real line), although the shape of the bulk parts depends on the specific choice of .
Iv-E On the domain of , , , and
In the previous discussions, we assumed that , , and are all defined on . This assumption was implicitly used to derive the PCA-EP algorithm. Specifically, the Gaussian pdf for obtaining (78) in the derivation of PCA-EP is not well defined if . Nevertheless, the final PCA-EP algorithm in (21) is well defined, as long as is well defined. We have assumed . Let us further assume that is bounded from below:
where . Under this assumption, is well defined as long as . On the other hand, we only focused on the domain (or ) in our previous discussions. In particular, we conjectured that if and only if there exists an informative solution (see (31)) for . A natural question is what if the SE equations in (31) do not have informative solutions for , but do have such a solution for ? To be specific, suppose that satisfies the following conditions:
Then, one might ask what happens if we consider a PCA-EP algorithm by setting to such a ? It turns out that, based on arguments similar to those presented in Section IV-D, we can obtain
Namely, our method can provide a conjecture about the minimum eigenvalue of . To see this, we note that using exactly the same arguments as those in Section IV-D, we claim (heuristically) that
Consider the matrix