Phase retrieval has been widely studied in machine learning, signal processing and optimization. The goal of phase retrieval is to recover a signalprovided the observations of the amplitude of its linear measurements:
where or , are observations, and is an unknown variable we wish to recover (e.g. see ). A well studied form of the phase retrieval problem is
where now represent the squared magnitudes of the observations. It is shown in  that the phase retrieval problem is NP-hard. Recent work on the phase retrieval problem [15, 24, 20, 11, 10] focuses on the real phase retrieval problem where it is assumed that for each . This is the line of inquiry we follow. In the following discussion the rows of the matrix are the vectors .
The two most popular approaches to the real phase retrieval problem are through semidefinite programming relaxations [2, 10, 12, 17, 21, 26, 31] and convex-composite optimization [5, 20, 24]. These approaches formulate real phase retrieval problem as an optimization problem of the form
where is chosen to be either the or the square of norm, and, for any vector , and are vectors in whose components are the absolute value and squares of those in . The objective in (1.3) is a composition of a convex and a smooth function, and is called convex-composite. This structure plays a key role in both optimality conditions and algorithm development for (1.3) .
In the noiseless case, when there exists a vector such that (or, ), a gradient based method called Wirtinger Flow (WF) was introduced by  to solve the smooth problem
WF admits a linear convergence rate when properly initialized. Further work along this line includes the Truncated Wirtinger Flow (TWF), e.g., see . Truncated Wirtinger Flow requires measurements as opposed to the measurements in WF to obtain a linear rate. A similar approach using sub-gradient is used to minimize in  for the noiseless case.
Contributions. In this paper we address two forms of the robust phase retrieval problem, where the optimization objective takes the form
and it is assumed that the matrix satisfies the following Gaussian assumption:
Our goal is to establish a robust phase retrieval counterpart to the seminal paper by Candes and Tao on compressed sensing ( regression) .
Compressed sensing problems  take the form
where . This problem is known to be equivalent to the linear regression problem
where and (e.g., the columns of form basis of ). In  it is shown that there is a universal constant such that, under suitable conditions on (e.g., Assumption G), if satisfies , then is the unique solution to (1.6), with high probability. We prove similar exact recovery results for the two robust phase retrieval problems (1.4). In particular, we show that with high probability, when (Theorem 3). In this situation, the solution set to and the phase retrieval problem coincide, that is,
Thus, the phase retrieval problem can be solved by the phase retrieval problem , when there exists an with sufficiently sparse noise.
A key underlying structural requirement used by  is the Restricted Isometry Property (RIP). We also make use of an RIP property in the case. However, in the case a new property, which we call the p-Absolute Growth Property (p-AGP) (see Definition 2), is required. When , RIP implies 2-AGP. The p-AGP holds under Assumption G, with high probability (see Lemmas 4.1 and 4.2). A second key property, which mimics the so-called Null Space Property (NSP) in compressed sensing [18, 19, 23, 25], is also introduced. We call this the p-Absolute Range Property (p-ARP) (see Definition 2), and show that p-AGP implies p-ARP under Assumption G with high probablility. In , it is shown that, for problem (1.5), if satisfies RIP with parameter , then satisfies NSP of order . Correspondingly, we show that the p-AGP implies the p-ARP with high probability under Assumption G. (see Lemmas 4.1 and 4.2).
There are separate classes of methods for solving (1.4) for and . When , one can apply a smoothing method to the absolute value function [1, 27], or use other relaxation techniques that preserve the nonsmooth objective but introduce auxiliary variables . When , the solution methods typically exploit the convex-composite structure of the objective . These methods rely on two key conditions on the function : weak convexity (i.e., is convex for some ) and sharpness (i.e., for some where is the set of minimizers of ). Under these two properties, Duchi and Ruan , Drusvyatskiy, Davis and Paquette  and Charisopoulos, et al. establish convergence and iteration complexity results for prox-linear and subgradient algorithms. Recently  and  considered gradient-based methods for the problem when the noise is sparse for some . To establish locally linear convergence of their algorithms the authors of  require that the measurements satisfy for , while the authors of  require that for some . The results in  and  require for some and for some sufficiently small.
Conditions for the weak convexity of follow from results in [24, 20] under assumptions weaker than Assumption G. In the noiseless case, the sharpness of also follows from results in [24, 20]. In the noisy case, sharpness is established in [24, 20] under same assumptions on the sparsity of the noise.
We establish sharpness for both and under Assumption G uniformly for all possible supports of the sparse noise. Our result for case has a similar flavor to those in [24, 15], but more closely parallels the result of Candes and Tao in the compressed sensing case. When , our result has no precedence in the literature and requires a new approach. The function is not weakly convex since it is not even subdifferentially regular .
This paper is organized as follows. In section 2, we introduce the new properties p-ARP and p-AGP and provide a detailed description of how our program of proof parallels the program used in compressed sensing. In Section 3, we show that if satisfies p-ARP and the residual is sufficiently sparse, then with equality under Assumption G. In section 4, we show that Assumption G implies that p-AGP implies p-ARP with high probability. In the last section we show that is sharp with respect to , with high probability.
Lower case letters (i.e. , ) denote vectors, while denotes the th component of the . denote universal constants. , denote the Euclidean and norms of vector x, while denotes the ‘norm’ . For a matrix , denotes the Frobenius and denotes the operator norm. When is a vector, and . For a vector , and , is defined to be a vector in where the th entry is if and else where. . We say a vector is sparse if .
2 The Roadmap
It is shown in [23, 25] that every -sparse signal is the unique minimizer of the compressed sensing problem (1.5) with if and only if satisfies NSP of order for some . NSP of order is implied by the Restricted Isometry Property (RIP) for a sufficiently small RIP parameter , where a matrix is said to satisfy RIP with constant if 
It is known that RIP is satisfied under many distributional hypothesis on the matrix , for example, random matrices
with entries i.i.d. Gaussian or Bernoulli random variables are known to satisfies RIP with high probability forfor constant [3, 13, 14, 29]. Recapping, the general pattern of the proof for establishing that sufficiently sparse is the unique minimizer of problem (1.5) using distributional assumptions on is given in the following program:
We extend this program to the class of robust phase retrieval problems
for , to show that, under Assumption G, and when the residuals are sufficiently sparse, the vectors are the global minimizers of the real robust phase retrieval problems (2.3) with high probability. In our program, we substitute NSP and RIP with new properties called the -Absolute Range Property (p-ARP) and the -Absolute Growth Property (p-AGP), respectively.
[p-Absolute Range Property (p-ARP)]
For , we say satisfies the p-Absolute Range Property of order for if, for any and for any with ,
In order for Definition 2 to make sense, must be significantly larger than . This is illustrated by the following example.
For , an example in which ARP does not hold for any order is for any . An example in which ARP of order holds is for any .
where the columns of form a basis of .
[p-Absolute Growth Property (p-AGP)] For , we say that the matrix satisfies the p-Absolute Growth Property if there exists constants and a mapping such that
The mapping is introduced to accommodate the fact that the robust phase retrieval problem cannot have unique solutions since if solves (2.3) then so does . For this reason, (2.5) implies that if , then . In what follows, we take
The relationship between RIP and p-AGP is now seen by comparing (2.2) with (2.5). A fundamental (and essential) difference is that RIP for compressed sensing applies to any selection of columns from where is considered to be small since it determines the sparsity of the solution. On the other hand, our p-AGP applies to the rows of corresponding to the zero entries in the sparse residual vector .
We can now more precisely describe how our program of proof parallels the one used for compressed sensing.
3 Global minimization under p-ARP
In this section we parallel the discussion given in  with NSP replaced by p-ARP. We begin by introducing a measure of residual sparsity. For a vector , let be the set of indices corresponding to the largest entries in the residual vector and define
Note that if and only if . Let , and . If the matrix satisfies p-ARP of order for , then
for all . In either case 1 or 2 above, let be the set of indices of the largest entries in . Then
By (3.3), we know
The main result of this section now follows. Let , , and suppose is such that is sparse. Let the assumptions of Lemma 3 holds. Then is a global minimizer of the robust phase retrieval problem (2.3). Moreover, for any ,
If is another global minimizer, then . If it is further assumed that the entries of are i.i.d. standard Gaussians and , then, with probability 1, is the unique solution of (2.3) up to multiplication by . By lemma 3, since ,
and so for all , i.e., is a global minimizer. Again by Lemma 3,
In the next section we show that under Assumption G, p-ARP of order holds for a sufficiently small constant , with high probability.
4 Assumption G p-AGP p-ARP
In this section we use of the Gaussian Assumption G on the matrix to show that p-AGP holds for with high probability, and that p-AGP implies p-ARP of order with high probability for a constant . The cases and are treated separately since different techniques are required.
[Assumption G 2-AGP(RIP)] [17, Lemma 1] Under Assumption G, there exists universal constants such that for , if , then with probability at least ,
for all symmetric rank-2 matrices which implies 2-AGP with , and .
[Assumption G 2-AGP 2-ARP] Under assumption G, there exist universal constants such that if and satisfies G, then
with probability at least . Consequently, 2-ARP holds for with high probability for sufficiently large. We first derive conditions on so that exists. To this end let be given. Let be any subset of indices and denote by the sub-matrix of whose rows correspond to the indices in . With this notation, we have . Also note that the entries of the matrix satisfy G. By Lemma 4.1, there exist universal constants such that if , then, for and each subset with ,
fails to hold with probability no greater than , that is, 2-AGP holds for . Since there are
such ’s, the event
Choose so that . Then, for all , . Thus, if event occurs, we have
where the first inequality follows from (4.1) applied to the first term and (4.2) applied to the second, and the second inequality follows by (4.2). Consequently, as long as is chosen so that , the conclusion follows. This can be accomplished by choosing so that (or equivalently, ) and then choosing .
This case requires a series of four technical lemmas in order to establish the main results. We list these lemmas below, and their proofs are in the appendix (Section 7).
Under assumption G, there exist universal constants such that for sufficiently small, if , then with probability at least ,
Under assumption G, there exists universal constants such that for sufficiently small, if , then with probability at least ,
For , if (i.e. ), then
We first show that if the matrix satisfies Assumption G, then it satisfies AGP with high probability.
[Assumption G 1-AGP] Under assumption G, there exist universal constants such that for sufficiently small, if , then with probability at least ,
where is defined in (2.6), and . Consequently, 1-AGP holds with high probability for sufficiently large. By Lemma 4.2 and Lemma 4.2, there exist universal constant such that for sufficiently small, if , then with probability at least , (4.4) and (7.6) hold. Since we can substitute by if necessary, without loss of generality, we assume .
For the left hand inequality of (4.8), we consider two cases: (1) , and (2) .
Assume . We have
where the first inequality is by Cauchy-Schwartz inequality applied to the vectors with and , the second inequality is by Lemma 4.2 and Lemma 4.2, the third inequality is by Lemma 4.2, the fourth inequality is by Lemma (4.2) and the last inequality is by . When , one can show by direct computation that