 # On the Global Minimizers of Real Robust Phase Retrieval with Sparse Noise

We study a class of real robust phase retrieval problems under a Gaussian assumption on the coding matrix when the received signal is sparsely corrupted by noise. The goal is to establish conditions on the sparsity under which the input vector can be exactly recovered. The recovery problem is formulated as the minimization of the ℓ_1 norm of the residual. The main contribution is a robust phase retrieval counterpart to the seminal paper by Candes and Tao on compressed sensing (ℓ_1 regression) [Decoding by linear programming. IEEE Transactions on Information Theory, 51(12):4203-4215, 2005]. Our analysis depends on a key new property on the coding matrix which we call the Absolute Range Property (ARP). This property is an analogue to the Null Space Property (NSP) in compressed sensing. When the residuals are computed using squared magnitudes, we show that ARP follows from a standard Restricted Isometry Property (RIP). However, when the residuals are computed using absolute magnitudes, a new and very different kind of RIP or growth property is required. We conclude by showing that the robust phase retrieval objectives are sharp with respect to their minimizers with high probability.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Phase retrieval has been widely studied in machine learning, signal processing and optimization. The goal of phase retrieval is to recover a signal

provided the observations of the amplitude of its linear measurements:

 |⟨ai,x⟩|=bi,1≤i≤m (1.1)

where or , are observations, and is an unknown variable we wish to recover (e.g. see ). A well studied form of the phase retrieval problem is

 |⟨ai,x⟩|2=bi,1≤i≤m, (1.2)

where now represent the squared magnitudes of the observations. It is shown in  that the phase retrieval problem is NP-hard. Recent work on the phase retrieval problem [15, 24, 20, 11, 10] focuses on the real phase retrieval problem where it is assumed that for each . This is the line of inquiry we follow. In the following discussion the rows of the matrix are the vectors .

The two most popular approaches to the real phase retrieval problem are through semidefinite programming relaxations [2, 10, 12, 17, 21, 26, 31] and convex-composite optimization [5, 20, 24]. These approaches formulate real phase retrieval problem as an optimization problem of the form

 minxρ(|Ax|2−b), (1.3)

where is chosen to be either the or the square of norm, and, for any vector , and are vectors in whose components are the absolute value and squares of those in . The objective in (1.3) is a composition of a convex and a smooth function, and is called convex-composite. This structure plays a key role in both optimality conditions and algorithm development for (1.3) .

In the noiseless case, when there exists a vector such that (or, ), a gradient based method called Wirtinger Flow (WF) was introduced by  to solve the smooth problem

 minx∥∥|Ax|2−b∥∥22.

WF admits a linear convergence rate when properly initialized. Further work along this line includes the Truncated Wirtinger Flow (TWF), e.g., see . Truncated Wirtinger Flow requires measurements as opposed to the measurements in WF to obtain a linear rate. A similar approach using sub-gradient is used to minimize in  for the noiseless case.

Contributions. In this paper we address two forms of the robust phase retrieval problem, where the optimization objective takes the form

 minxfp(x):=∥∥|Ax|p−b∥∥1for p=1,2, (1.4)

and it is assumed that the matrix satisfies the following Gaussian assumption:

 G:The entries of A are i.i.d. standard Gaussians N(0,1).

Our goal is to establish a robust phase retrieval counterpart to the seminal paper by Candes and Tao on compressed sensing ( regression) .

Compressed sensing problems  take the form

 miny∥y∥1 such that Φy=c, (1.5)

where . This problem is known to be equivalent to the linear regression problem

 minx∥Ax−b∥1, (1.6)

where and (e.g., the columns of form basis of ). In  it is shown that there is a universal constant such that, under suitable conditions on (e.g., Assumption G), if satisfies , then is the unique solution to (1.6), with high probability. We prove similar exact recovery results for the two robust phase retrieval problems (1.4). In particular, we show that with high probability, when (Theorem 3). In this situation, the solution set to and the phase retrieval problem coincide, that is,

 {x∗,−x∗}=argminx∥∥|Ax|p−b∥∥0. (1.7)

Thus, the phase retrieval problem can be solved by the phase retrieval problem , when there exists an with sufficiently sparse noise.

A key underlying structural requirement used by  is the Restricted Isometry Property (RIP). We also make use of an RIP property in the case. However, in the case a new property, which we call the p-Absolute Growth Property (p-AGP) (see Definition 2), is required. When , RIP implies 2-AGP. The p-AGP holds under Assumption G, with high probability (see Lemmas 4.1 and 4.2). A second key property, which mimics the so-called Null Space Property (NSP) in compressed sensing [18, 19, 23, 25], is also introduced. We call this the p-Absolute Range Property (p-ARP) (see Definition 2), and show that p-AGP implies p-ARP under Assumption G with high probablility. In , it is shown that, for problem (1.5), if satisfies RIP with parameter , then satisfies NSP of order . Correspondingly, we show that the p-AGP implies the p-ARP with high probability under Assumption G. (see Lemmas 4.1 and 4.2).

There are separate classes of methods for solving (1.4) for and . When , one can apply a smoothing method to the absolute value function [1, 27], or use other relaxation techniques that preserve the nonsmooth objective but introduce auxiliary variables . When , the solution methods typically exploit the convex-composite structure of the objective . These methods rely on two key conditions on the function : weak convexity (i.e., is convex for some ) and sharpness (i.e., for some where is the set of minimizers of ). Under these two properties, Duchi and Ruan , Drusvyatskiy, Davis and Paquette  and Charisopoulos, et al. establish convergence and iteration complexity results for prox-linear and subgradient algorithms. Recently  and  considered gradient-based methods for the problem when the noise is sparse for some . To establish locally linear convergence of their algorithms the authors of  require that the measurements satisfy for , while the authors of  require that for some . The results in  and  require for some and for some sufficiently small.

Conditions for the weak convexity of follow from results in [24, 20] under assumptions weaker than Assumption G. In the noiseless case, the sharpness of also follows from results in [24, 20]. In the noisy case, sharpness is established in [24, 20] under same assumptions on the sparsity of the noise.

We establish sharpness for both and under Assumption G uniformly for all possible supports of the sparse noise. Our result for case has a similar flavor to those in [24, 15], but more closely parallels the result of Candes and Tao in the compressed sensing case. When , our result has no precedence in the literature and requires a new approach. The function is not weakly convex since it is not even subdifferentially regular .

This paper is organized as follows. In section 2, we introduce the new properties p-ARP and p-AGP and provide a detailed description of how our program of proof parallels the program used in compressed sensing. In Section 3, we show that if satisfies p-ARP and the residual is sufficiently sparse, then with equality under Assumption G. In section 4, we show that Assumption G implies that p-AGP implies p-ARP with high probability. In the last section we show that is sharp with respect to , with high probability.

### 1.1 Notation

Lower case letters (i.e. , ) denote vectors, while denotes the th component of the . denote universal constants. , denote the Euclidean and norms of vector x, while denotes the ‘norm’ . For a matrix , denotes the Frobenius and denotes the operator norm. When is a vector, and . For a vector , and , is defined to be a vector in where the th entry is if and else where. . We say a vector is sparse if .

Recall from the compressed sensing literature [18, 19] that a matrix satisfies Null Space Property (NSP) of order at if

 ∥yT∥1≤ψ∥yTc∥1∀y∈Null(Φ) and |T|≤L. (2.1)

It is shown in [23, 25] that every -sparse signal is the unique minimizer of the compressed sensing problem (1.5) with if and only if satisfies NSP of order for some . NSP of order is implied by the Restricted Isometry Property (RIP) for a sufficiently small RIP parameter  , where a matrix is said to satisfy RIP with constant if 

 (1−δL)∥y∥22≤∥Φy∥22≤(1+δL)∥y∥22∀L-sparse vectors y∈Rm. (2.2)

It is known that RIP is satisfied under many distributional hypothesis on the matrix , for example, random matrices

with entries i.i.d. Gaussian or Bernoulli random variables are known to satisfies RIP with high probability for

for constant [3, 13, 14, 29]. Recapping, the general pattern of the proof for establishing that sufficiently sparse is the unique minimizer of problem (1.5) using distributional assumptions on is given in the following program:

(CS)   DistributionalAssumptions [r, Rightarrow, ””] & RIP [r, Rightarrow, ””] & NSP[rr, Leftrightarrow, ”[23, 25]”] && y_* minimizes (1.5).

We extend this program to the class of robust phase retrieval problems

 minxfp(x):=∥∥|Ax|p−b∥∥1 (2.3)

for , to show that, under Assumption G, and when the residuals are sufficiently sparse, the vectors are the global minimizers of the real robust phase retrieval problems (2.3) with high probability. In our program, we substitute NSP and RIP with new properties called the -Absolute Range Property (p-ARP) and the -Absolute Growth Property (p-AGP), respectively.

[p-Absolute Range Property (p-ARP)]

For , we say satisfies the p-Absolute Range Property of order for if, for any and for any with ,

 ∥∥(|Ax|p−|Ay|p)T∥∥1≤ψp∥∥(|Ax|p−|Ay|p)Tc∥∥1 ∀x,y∈Rnand T⊆[m] with |T|≤Lp. (2.4)

In order for Definition 2 to make sense, must be significantly larger than . This is illustrated by the following example.

###### Example

For , an example in which ARP does not hold for any order is for any . An example in which ARP of order holds is for any .

The connection between -ARP and NSP is seen by observing the parallels between (2.4) the fact that satisfies NSP of order for (2.1) if

 ∥(Ax−Ay)T∥1≤ψ∥(Ax−Ay)Tc∥∀x,y∈Rn and T⊆[m] with |T|≤L,

where the columns of form a basis of .

[p-Absolute Growth Property (p-AGP)] For , we say that the matrix satisfies the p-Absolute Growth Property if there exists constants and a mapping such that

 μ1ϕp(x,y)≤1m∥∥|Ax|p−|Ay|p|∥∥1≤μ2ϕp(x,y)∀x,y∈Rn. (2.5)

The mapping is introduced to accommodate the fact that the robust phase retrieval problem cannot have unique solutions since if solves (2.3) then so does . For this reason, (2.5) implies that if , then . In what follows, we take

 ϕ2(x,y):=∥∥xxT−yyT∥∥F  and  ϕ1(x,y):=min{∥x+y∥,∥x−y∥}∀x,y∈Rn. (2.6)

The relationship between RIP and p-AGP is now seen by comparing (2.2) with (2.5). A fundamental (and essential) difference is that RIP for compressed sensing applies to any selection of columns from where is considered to be small since it determines the sparsity of the solution. On the other hand, our p-AGP applies to the rows of corresponding to the zero entries in the sparse residual vector .

We can now more precisely describe how our program of proof parallels the one used for compressed sensing.

1. : G [rr, Rightarrow, ”Lem  4.1”] && RIP⇒2-AGP [rr, Rightarrow, ”Lem  4.1”] && 2-ARP[rr, Rightarrow, ”Thm  3”] && x*minimizesf2(x)

2. : G [rr, Rightarrow, ”Lem  4.2”] && 1-AGP [rr, Rightarrow, ”Lem  4.2”] && 1-ARP[rr, Rightarrow, ”Thm  3”] && x*minimizesf1(x)

## 3 Global minimization under p-ARP

In this section we parallel the discussion given in  with NSP replaced by p-ARP. We begin by introducing a measure of residual sparsity. For a vector , let be the set of indices corresponding to the largest entries in the residual vector and define

 σpL(x):=∥∥(|Ax|p−b)Tc∥∥1.

Note that if and only if . Let , and . If the matrix satisfies p-ARP of order for , then

 ∥∥|Ax|p−|Ay|p∥∥1≤1+ψ1−ψ(∥∥|Ax|p−b∥∥1−∥∥|Ay|p−b∥∥1+2σpL(y)), (3.1)

for all . In either case 1 or 2 above, let be the set of indices of the largest entries in . Then

 ∥∥(|Ax|p−|Ay|p)Tc∥∥1 ≤∥∥(|Ax|p−b)Tc∥∥1+∥∥(|Ay|p−b)Tc∥∥1 (3.2) =∥∥|Ax|p−b∥∥1−∥∥(|Ax|p−b)T∥∥1+σpL(y) =∥∥(|Ay|p−b)T∥∥1−∥∥(|Ax|p−b)T∥∥1 +∥∥|Ax|p−b∥∥1−∥∥|Ay|p−b∥∥1+2σpL(y) ≤∥∥(|Ax|p−|Ay|p)T∥∥1+∥∥|Ax|p−b∥∥1−∥∥|Ay|p−b∥∥1+2σpL(y).

By p-ARP,

 ∥∥(|Ax|p−|Ay|p)T∥∥1≤ψ∥∥(|Ax|p−|Ay|p)Tc∥∥1. (3.3)

Consequently, by (3.2) and (3.3),

 ∥∥(|Ax|p−|Ay|p)Tc∥∥≤11−ψ(∥∥|Ax|p−b∥∥1−∥∥|Ay|p−b∥∥1+2σpL(y)). (3.4)

By (3.3), we know

 ∥∥|Ax|p−|Ay|p∥∥ =∥∥(|Ax|p−|Ay|p)T∥∥1+∥∥(|Ax|p−|Ay|p)Tc∥∥1 ≤(1+ψ)∥∥(|Ax|p−|Ay|p)Tc∥∥1.

By combining this with (3.4), we obtain (3.1) which holds true for all .

The main result of this section now follows. Let , , and suppose is such that is sparse. Let the assumptions of Lemma 3 holds. Then is a global minimizer of the robust phase retrieval problem (2.3). Moreover, for any ,

 ∥∥|Ax|p−|Ax∗|p∥∥1≤2(1+ψ)1−ψσpL(x).

If is another global minimizer, then . If it is further assumed that the entries of are i.i.d. standard Gaussians and , then, with probability 1, is the unique solution of (2.3) up to multiplication by . By lemma 3, since ,

 ∥∥|Ax|p−|Ax∗|p∥∥1≤1+ψ1−ψ(∥∥|Ax|p−b∥∥1−∥∥|Ax∗|p−b∥∥1)∀x∈Rn, (3.5)

and so for all , i.e., is a global minimizer. Again by Lemma 3,

 ∥∥|Ax|p−|Ax∗|p∥∥1 (3.6) ≤2(1+ψ)1−ψσpL(x) (3.7)

Inequality (3.5) also implies that if there is another minimizer , then . The final statement on the uniqueness of is established in [2, Corollary 2.6].

In the next section we show that under Assumption G, p-ARP of order holds for a sufficiently small constant , with high probability.

## 4 Assumption G ⟹ p-AGP ⟹ p-ARP

In this section we use of the Gaussian Assumption G on the matrix to show that p-AGP holds for with high probability, and that p-AGP implies p-ARP of order with high probability for a constant . The cases and are treated separately since different techniques are required.

### 4.1 p=2

We begin by re-stating [17, Lemma 1] in our notation, where the conclusion of [17, Lemma 1] is called RIP in .

[Assumption G 2-AGP(RIP)] [17, Lemma 1] Under Assumption G, there exists universal constants such that for , if , then with probability at least ,

 0.9(1−ϵ)∥M∥F≤1mm∑i=1|AiMATi|≤√2(1+ϵ)∥M∥F (4.1)

for all symmetric rank-2 matrices which implies 2-AGP with , and .

[Assumption G 2-AGP 2-ARP] Under assumption G, there exist universal constants such that if and satisfies G, then

 ∥∥(|Ax|2−|Ay|2)T∥∥1≤ψ∥∥(|Ax|2−|Ay|2)Tc∥∥1 ∀x,y∈Rnand T⊆[m] with |T|≤sm

with probability at least . Consequently, 2-ARP holds for with high probability for sufficiently large. We first derive conditions on so that exists. To this end let be given. Let be any subset of indices and denote by the sub-matrix of whose rows correspond to the indices in . With this notation, we have . Also note that the entries of the matrix satisfy G. By Lemma 4.1, there exist universal constants such that if , then, for and each subset with ,

 0.9(1−ϵ)∥∥xxT−yyT∥∥F≤1(1−s)m∥∥(|Ax|2−|Ay|2)Tc∥∥1≤√2(1+ϵ)∥∥xxT−yyT∥∥F (4.2)

fails to hold with probability no greater than , that is, 2-AGP holds for . Since there are

 (m(1−s)m)=(msm)≤(emsm)sm=(es)sm

such ’s, the event

 B:={(???) holds for every T⊆[m] with |T|=sm}∩{(???) holds},

satisfies

 P(B) ≥1−C(e/s)smexp(−c1ϵ2(1−s)m)−Cexp(−c1ϵ2m) =1−Cexp((1+c1ϵ2)sm+smlog(1s)−c1ϵ2m)−Cexp(−c1ϵ2m).

Choose so that . Then, for all , . Thus, if event occurs, we have

 ∥∥(|Ax|2−|Ay|2)T∥∥1 =∥∥|Ax|2−|Ay|2∥∥1−∥∥(|Ax|2−|Ay|2)Tc∥∥1 (4.3) ≤√2(1+ϵ)m∥∥xxT−yyT∥∥F−0.9(1−ϵ)(1−s)m∥∥xxT−yyT∥∥F ≤√2(1+ϵ)−0.9(1−ϵ)(1−s)0.9(1−ϵ)(1−s)∥∥(|Ax|2−|Ay|2)Tc∥∥1,

where the first inequality follows from (4.1) applied to the first term and (4.2) applied to the second, and the second inequality follows by (4.2). Consequently, as long as is chosen so that , the conclusion follows. This can be accomplished by choosing so that (or equivalently, ) and then choosing .

### 4.2 p=1

This case requires a series of four technical lemmas in order to establish the main results. We list these lemmas below, and their proofs are in the appendix (Section 7).

Under assumption G, there exist universal constants such that for sufficiently small, if , then with probability at least ,

 (1−~ϵ)√2π∥h∥≤1mm∑i=1|Aih|≤(1+~ϵ)√2π∥h∥∀h∈Rn. (4.4)

Under assumption G, there exists universal constants such that for sufficiently small, if , then with probability at least ,

 1mm∑i=1∣∣|Aix|2−|Aiy|2∣∣12≥0.77(1−~ϵ)∥∥xxT−yyT∥∥12F∀x,y∈Rn. (4.5)

For , if (i.e. ), then

 ∥x+y∥+(√2−1)∥x−y∥≥∥x∥+∥y∥ (4.6)

For ,

 √2∥∥xxT−yyT∥∥F≥∥x+y∥∥x−y∥ (4.7)

We first show that if the matrix satisfies Assumption G, then it satisfies AGP with high probability.

[Assumption G 1-AGP] Under assumption G, there exist universal constants such that for sufficiently small, if , then with probability at least ,

 μ1ϕ1(x,y)≤1m∥|Ax|−|Ay|∥1≤μ2ϕ1(x,y)∀x,y∈Rn, (4.8)

where is defined in (2.6), and . Consequently, 1-AGP holds with high probability for sufficiently large. By Lemma 4.2 and Lemma 4.2, there exist universal constant such that for sufficiently small, if , then with probability at least , (4.4) and (7.6) hold. Since we can substitute by if necessary, without loss of generality, we assume .

The right hand inequality in (4.8) easily follows by (4.4) and triangle inequality

 ∥|Ax|−|Ay|∥1≤∥A(x−y)∥1.

For the left hand inequality of (4.8), we consider two cases: (1) , and (2) .

1. Assume . By (4.4), we know

 1m∥|Ax|−|Ay|∥1=1mm∑i=1||Aix|−|Aiy|| (4.9) =1mm∑i=1|Ai(x+y)|+1mm∑i=1|Ai(x−y)|−1mm∑i=1|Aix|−1mm∑i=1|Aiy| ≥√2π((1−ϵ)∥x+y∥+(1−ϵ)∥x−y∥−(1+ϵ)∥x∥−(1+ϵ)∥y∥) ≥√2π((2−√2−√2ϵ)∥x−y∥−2ϵ∥x+y∥) ≥√2π(2−√2−(√2+20)ϵ)∥x−y∥,

where the second equality is from for (since if , then and and if , then and ), the first inequality is from Lemma 4.2 (with successively set to , and ), the second inequality uses Lemma 4.2 to replace , and the last inequality follows from our assumption that .

2. Assume . We have

 1m∥|Ax|−|Ay|∥1 =1mm∑i=1||Aix|−|Aiy|| (4.10) ≥(1mm∑i=1||Aix|2−|Aiy|2|12)2/(1mm∑i=1(|Aix|+|Aiy|)) ≥√π20.772(1−ϵ)2∥∥xxT−yyT∥∥F(1+ϵ)(∥x∥+∥y∥) ≥0.772√π(1−ϵ)2∥x+y∥∥x−y∥2(1+ϵ)(∥x∥+∥y∥) ≥0.772√π(1−ϵ)2∥x+y∥∥x−y∥2(1+ϵ)(∥x+y∥+(√2−1)∥x−y∥) ≥5⋅0.772√π(1−ϵ)2(√2+9)(1+ϵ)∥x−y∥

where the first inequality is by Cauchy-Schwartz inequality applied to the vectors with and , the second inequality is by Lemma 4.2 and Lemma 4.2, the third inequality is by Lemma 4.2, the fourth inequality is by Lemma (4.2) and the last inequality is by . When , one can show by direct computation that

 5⋅0.772√π(1−ϵ)2(√2+9)(1+ϵ)>0.02+√2π(2−√2),

and so

 1m∥|Ax|−|Ay|∥1≥√2π(2−√2)∥x−y∥ (4.11)

Consequently,

 1m∥|Ax|−|Ay|∥1≥√