1 Introduction
1.1 Phase retrieval problem
The phase retrieval problem is to recover an dimensional signal from a system of phaseless equations
(1) 
where is the unknown vector to be recovered, for are given sensing vectors, is the vector of observed modulus data, and is the number of measurements (or the sample size). This problem arises in many fields such as Xray crystallography [21], optics [38], microscopy [29], and others [14].
The classical approaches for phase retrieval were mostly based on alternating projections, e.g., the work of Gerchberg and Saxton [18] and Fienup [14], which usually work very well empirically but lack provable guarantees in the primary literatures. Recently, lots of attentions has been paid to constructing efficient algorithms with theoretical guarantees when given certain classes of sampling vectors. In these approaches, one of the main targets is to achieve optimal sampling complexity for, e.g, reducing the cost of sampling and computation. They are categorized into convex and nonconvex optimization based approaches. Typical convex approaches such as phaselift [7] transfer the phase retrieval problem into a semidefinite programming(SDP), which lifts the unknown dimensional signal to an matrix and thus computationally expensive. To overcome this, some other convex approaches such as Phasemax [19] and others [20, 1] solve convex optimizations with unknowns only. However, these convex formulations depend highly on the socalled anchor vectors that approximate the unknown signal, and the sampling complexity might be unnecessarily large if the anchor vector is not good enough. Meanwhile, nonconvex optimization based approaches were proposed and studied in the past years. Examples include alternating minimization [31] (or Fienup methods), Wirtinger flow [8], Kaczmarz [36, 42], Riemannian optimization [5], and GaussNewton [17, 28]. To prove the guarantee, these algorithms normally require a good initialization close enough to the ground truth, which is achieved by spectral initializations. Nevertheless, experimental results suggest that the designed initialization is not necessary — a random initialization usually lead to the correct phase retrieval. To explain this, either the algorithms with random initialization is studied [37, 12, 35], or the global geometric landscape of nonconvex objective functions are examined [26, 34], showing that there is actually no spurious local minima.
1.2 Sparse phase retrieval
All the aforementioned provable algorithms need a sampling complexity with . This sampling complexity is (nearly) optimal, since the phase retrieval problem is solvable only when and for real and complex signals respectively [2, 13]. Nevertheless, there is still a demand to further reduce the sampling complexity to save the cost of sampling. We have to exploit the structure of the underlying signal. In many applications, especially those related to signal processing and imaging, the true signal is known to be sparse or approximately sparse in a transform domain [33]. Have this priori knowledge in mind, it is possible to recover the signal using only a small number (possibly sublinear in ) of phaseless samples.
For simplicty, we assume that is sparse with sparsity at most , i.e., , where stands for the number of nonzero entries. With this sparsity constraint, the phase retrieval problem (1) can be reformulated as: find such that
(2) 
The problem (2) is referred to a sparse phase retrieval problem. It has been proved that the sample size is necessary and sufficient to determine a unique solution for the problem (2) with generic measurements in the real case [41]. Thus it opens the possibility for successful sparse phase retrieval using very few samples.
Though the sparse phase retrieval problem (2) is NPhard in general, there are many available algorithms that are guaranteed to find
with overwhelming probability under certain class of random measurements. Examples of such algorithms are
regularized PhaseLift method [25], sparse AltMin [31], thresholding/projected Wirtinger flow [6, 32], SPARTA [40], CoPRAM [23], and a twostage strategy introduced in [22]. All these approaches except for [22] ^{1}^{1}1A twostage sampling scheme is proposed in [22], where the first stage for sparse compressed sensing and the second stage for phase retrieval. It needs samples, but the sampling scheme is complicated. are analyzed under Gaussian random measurements, showing random Gaussian measurements are sufficient to achieve a successful sparse phase retrieval. Though not optimal, this sampling complexity is much smaller than that in the general phase retrieval. For convex approaches, it has been shown in [25] that random Gaussian samples is necessary. For nonconvex approaches, the algorithms are ususally divided into two stages, namely the initialization stage and the refinement stage. In the initialization stage, a spectral initialization is performed, and it requiresGaussian random samples to achieve an estimation sufficiently close to the ground truth. In the refinement stage, the initial estimation is refined by different algorithms, most of which are able to converge to the ground truth linearly using
Gaussian random samples. It is still unknown whether or not the sampling complexity can be reduced to under Gaussian random measurements.1.3 Our Contributions
In this paper, we propose a simple yet efficient nonconvex algorithm with provable recovery guarantees for sparse phase retrieval. Similar to most of the existing nonconvex algorithms, our proposed algorithm is divided into two stages, and the initialization stage rely on a spectral initialization also. So, we cannot reduce the sampling complexity to optimal as well. Instead, we focus on the improvement of the computational efficiency in the refinement stage, using random Gaussian samples. Different to existing algorithms that usually converges linearly, our proposed algorithm is proven to have the exact recovery of the sparse signal in at most steps, while it has almost the same computational cost per step as others. Therefore, our algorithm is much more efficient than existing algorithms. Experimental results confirm this, showing that our algorithm gives a very accurate recovery in very few iterations, and it gains several times acceleration over existing algorithms.
Our proposed algorithm is based on the hard thresholding pursuit (HTP) for compressed sensing introduced in [16]. Building on the projected gradient descent (or iterative hard thresholding (IHT) [3]), the idea of HTP is to project the current guess into the space that best match the measurements in each iteration. With the help of the restricted isometry property (RIP) [9], HTP for compressed sensing is proved to have a robust sparse signal recovery in finite steps starting from any initial guess. Our proposed algorithm is an adoption of HTP from linear measurements to phaseless measurements. Giving a current iteration, we first estimate the phase of the phaseless measurements, and then one step of HTP iteration is applied. Our algorithm has almost the same computational cost as compressed sensing HTP, while preserving the convergence in finite steps (in particular, in steps). Since the HTP algorithm is a second order Newton’s method (see e.g. [24, 44]), our algorithm can be viewed as a Newton’s method for sparse phase retrieval.
1.4 Notation and outline
For any , we denote the entrywise product of and , i.e., . For , is defined by if , and otherwise. is the number of nonzero entries of , and is the standard norm, i.e. . For a matrix , is its transpose, and denotes its spectral norm. For an index set , (or sometimes) stands for the submatrix of a matrix obtained by keeping only the columns indexed by , and (or sometimes) denotes the vector obtained by keeping only the components of a vector indexed by . The hard thresholding operator keeps the largest components in magnitude of a vector in and sets the other ones to zero. We use and for the dimensional norm unit ball and dimensional norm unit sphere in respectively. More precisely, means that and , and means that and . For , in this paper represents the logarithm of to the base . For , the distance between and is defined as
(3) 
The rest of papers are organized as follows. In Section 2 and 3, we introduce the details of the proposed algorithm and our main theoretical results, respectively. Numerical experiments to illustrate the performance of the algorithm are given in Section 4. The proofs are given in Section 5, and we conclude the paper in Section 6.
2 Algorithms
In this section, we describe our proposed algorithm in detail. Similar to most of the existing nonconvex sparse phase retrieval algorithms, our algorithms consists of two stages, namely, the initialization stage and the iterative refinement stage. Since the initialization stage can be done by an offtheshelf algorithm such as spectral initialization, we will focus on the iterative refinement stage. We first give some related algorithms, especially iterative hard thresholding algorithms, in Section 2.1, and then our proposed algorithm is presented in Section 2.2.
To simplify the notations, we denote the sampling matrix and the observations by
(4) 
respectively, where and for are from (1) and (2). Thus, the sparse phase retrieval problem (2) can be rewritten as to find satisfying
(5) 
There are several possible ways to solve (5) by reformulating it into constrained minimizations with different objective functions.
2.1 Iterative Hard Thresholding Algorithms
One natural way to solve (5) is to consider a straightforward least squares fitting to the amplitude equations in (5) subject to the sparsity constraint, and we solve
(6) 
Though the objective function is nonsmooth, it is convex so that a subgradient descent is available. Furthermore, the projection onto the feasible set can be done efficiently by , though the feasible set is nonconvex. Altogether, one may apply a projected subgradient descent to solve (6), yielding
(7) 
This algorithm is an iterative hard thresholding (IHT) algorithm, since the hard thresholding operator is applied in each iteration. This algorithm is analyzed in [32] under a general framework, which proved that (7) converges linearly to with high probability when it is initialized in a small neighboughhood of and
Gaussian random measurements vectors are used. By ruling out some outlier phaseless equations from the least squares fitting at each iteration according to some truncation rule, the algorithm (
7) becomes the SPARTA algorithm [40] with the sampling complexity for sparse phase retrieval provided a good initialization. Without , the algorithm (7) and its variants are algorithms for the standard phase retrieval (1), including the truncated amplitude flow (TAF) algorithm [39], the reshaped Wirtinger flow (RWF) [43], both of which are guaranteed to have an exact phase retrieval starting from a good initialization when Gaussian random measurements are used.Alternatively, one may square both sides of the equation in (5) to obtain , where the square of a vector is componentwise. The resulting equation is known as the intensity equation in phase retrieval. Then, we can solve (5) by minimizing the least square error of the intensity equation subject to the sparsity constraint, which leads to solving the following constrained minimization
(8) 
The advantage of fitting the intensity equation is that the objective function is both smooth and convex. Together with the fact that the projection onto the sparse set can be easily done by , one can apply the projected gradient descent to solve (8) and obtain
(9) 
This is again an iterative hard thresholding algorithm. Since the gradient should be taken in the Wirtinger derivative in the complex signal case, the algorithm (9) is more widely known as projected Wirtinger flow [6, 32]. It is proved that (9) gives an exact sparse phase retrieval in a neighbourhood of when sampled by random Gaussian measurements. When there is no , (9) and a truncation variant are consistent with Wirtinger flow [8] and truncated Wirtinger flow [11] algorithms respectively for the standard phase retrieval.
2.2 The Proposed Algorithm
Comparing the two formulations (6) and (8), experimental results [43, 39, 32] suggest that algorithms based on the amplitude equation fitting (6) are usually more efficient than those on the intensity equation (8). Following this, we solve (6) as well. Our algorithm is motivated by the IHT algorithm (7) and the hard thresholding pursuit (HTP) algorithm [16] for compressed sensing.
Let be the approximation of at step . We observe that one iteration of (7) is just one step of projected gradient descent algorithm (a.k.a. the IHT algorithm [3]) applied to the following constrained least squares problem
(10) 
This formulation is exactly used in compressed sensing to recover an sparse signal from its linear measurements . Thus, (7) can be interpreted as: given , we first guess the sign of the phaseless measurements by , and then we solve the resulting compressed sensing problem (10) by one step of IHT [3]. Therefore, to improve the efficiency of (7), we may replace IHT by more efficient algorithms in compressed sensing for solving (10).
To this end, we use one step of hard thresholding pursuit (HTP) [16] to solve (10). Given , there are two substeps in HTP. In the first substep, HTP estimates the support of the sparse signal by the support of the output of IHT, i.e.,
The main computation is on matrixvector product, and it costs operations. In the second substep, instead of applying a gradienttype refinement, HTP then solves the least squares in (10) exactly by restricting the support of the unknown on , i.e.,
(11) 
This is a standard least squares problem with the coefficient matrix of size , which can be done efficiently in operations by, e.g., solving the normal equation
(12) 
Altogether, we obtain the iteration in our proposed algorithm, called HTP for sparse phase retrieval, depicted in Algorithm 1. The total computational cost is per iteration. In case of , the cost is the same order as , and thus one iteration of Algorithm 1 has almost the same computational cost as that of (7) and many other popular sparse phase retrieval algorithms.
HTP has been demonstrated much more efficient than IHT for compressed sensing both theoretically and empirically. Since IHT is a firstorder gradienttype algorithm, it converges at most linearly. On the contrary, HTP can break through the barrier of linear convergence, because it is a secondorder Newton’s method (see e.g. [44]). Its acceleration over IHT has been confirmed in many works [4, 16]. More interestingly, HTP enjoys a finitestep termination property — it gives the exact recovery of the underlying sparse signal after at most steps starting from any initial guess provided satisfies the restricted isometry property (RIP), as proved in [16]. Furthermore, the computational cost of HTP per iteration is the same order as that of IHT, if is sufficiently small compared to . Therefore, HTP outperforms IHT significantly in compressed sensing.
Because the iteration in Algorithm 1 is HTP for sparse phase retrieval, it is a secondorder Newton’s algorithm. Other existing nonconvex sparse phase retrieval algorithms are mostly the IHT algorithm (7) (for solving (6)) and (9) (for solving (8)), and their truncation variants [32, 6, 40]. Those algorithms are firstorder gradienttype algorithms. Therefore, according to the results in compressed sensing, our proposed algorithm is expected to require much fewer iterations to achieve an accurate sparse phase retrieval than those existing algorithms. This is indeed true, as shown by one example in Figure 1. More experimental results are demonstrated in Section 4. We see from these experimental results that: as expected, while IHTtype algorithms converge only linearly, the iteration in our proposed Algorithm 1 converges superlinearly and it gives the exact recovery in just a few of iterations. Moreover, we will prove this theoretically in the next section, revealing that Algorithm 1 inherits the finitestep convergence (in at most steps) property of HTP. Since our proposed algorithm needs the same order of computational cost per iteration as IHT algorithms do when , Algorithm 1 is an extremely efficient tool for sparse phase retrieval.
3 Theoretical Results
When the sparse phase retrieval problem has only one solution up to a global sign, only are global minimizers of the nonconvex optimization (6). However, due to the nonconvexity, no algorithm for solving (6) is guaranteed automatically to converge to a global minimizer, unless further analysis is provided. This section is devoted to present some theoretical results on the convergence guarantee of Algorithm 1 to one of the global minimizers , and the convergence speed is also investigated. In Section 3.1, we present results on local convergence of Algorithm 1, stating that Algorithm 1 converges to a global minimizer or starting from a neighbourhood of them. Algorithm 1 is able to return an exact global minimizer after at most steps, while it converges at least linearly. Combined with existing results on spectral initialization, we obtain the recovery guarantee of by Algorithm 1 in Section 3.2.
3.1 Local Convergence
We first present our result on the local convergence of Algorithm 1. In particular, we show that, when Gaussian random measurements are used, Algorithm 1 convergences to the underlying signal (under the metric in (3)) if it is initialized in a neighbourhood of . More interestingly, the convergence is way faster than the linear convergence, the typical local convergence rate of existing provable nonconvex sparse phase retrieval algorithms. The algorithm finds exactly after at most steps with the smallest nonzero entry in magnitude of . The result is summarized in the following Theorem 1.
Theorem 1 (Local convergence).
Let be a signal satisfying . Let be i.i.d. Gaussian random vectors with mean and covariance matrix . Let be the sequence generated by Algorithm 1 with the input measured data , , the step size , and an initial guess . Then, there exists universal positive constants and a universal constant such that: If
then

With probability at least ,

With probability at least ,
where is arbitrary, and is the smallest nonzero entry in magnitude of .
The proof of Theorem 1 is deferred to Section 5. We see from Theorem 1 that our algorithm not only converges linearly but also enjoys a finitestep termination with exact recovery. To estimate the steps needed for early termination, it is reasonable to assume for some positive . Indeed, many other sparse phase retrieval algorithms [40, 31] require to be as small as . When , our algorithm finds an exact global minimizer within steps. To the best of our knowledge, all other existing sparse phase retrieval algorithms converges only linearly. Moreover, the computational cost per iteration of our algorithm is in the same as several matrixvector products with and if is small compared to . Therefore, our algorithm is much more efficient than existing algorithms.
3.2 Initialization and Recovery Guarantees
To have a recovery guarantee, it remains to design an initial guess to satisfy the condition in Theorem 1. The same as many existing phase retrieval algorithms [37, 12, 35, 7, 11, 31, 40, 6, 23], we use a spectral initialization to achieve this goal. The idea of spectral initializations is to construct a matrix whose expectation has
as the principal eigenvectors, and thus a principal eigenvector of that matrix is a good approximation to
.Consider the case where are independent Gaussian random vectors. In the standard phase retrieval setting (1) without the sparsity constraint, it can be easily shown that the expectation of the matrix is , whose principal eigenvectors are . Therefore, we use a principal eigenvector of as the initialization. This is the spectral initialization used in, e.g., [7]. Other spectral initializations may use principal eigenvectors of variants of . For example, in the truncated spectral initialization [11], a principal eigenvector of a truncated version of is computed to approximate
initially. The optimal construction and its asymptotically analysis can be found in
[27].For sparse phase retrieval, one naive way is to use spectral initializations for standard phase retrieval directly. However, this will need unnecessarily many measurements, and the best sampling complexity expected is . To overcome this, we have to utilize the sparsity of . The idea is to first estimate the support of , and then obtain the initial guess by a principal eigenvector of the matrix (or its variants) restricted to the estimated support. Though various spectral initialization techniques are valid for our algorithm, we follow a natural strategy introduced in [23], which is as in the following.

The support of is estimated by the set of indices of top values in , denoted by . Since the expectation of is , could be be a good approximation of the support of .

We let be a principal eigenvector of with length , and . The reason is that is the principal eigenvector of the expectation of , and is the expectation of .
This choice of indeed satisfies the requirement on for any in Theorem 1, as stated in [23, Theorem IV.1].
Lemma 1 ([23, Theorem IV.1]).
Let be a signal satisfying . Let be i.i.d. Gaussian random vectors with mean and covariance matrix . Let be generated by (Init1) and (Init2) with input for . Then for any , there exists a positive constant depending only on such that if provided , we have
with probability at least .
Combined with the local convergence theorem, we obtain the recovery guarantee of our proposed Algorithm 1.
Theorem 2 (Recovery Guarantee).
Let be a signal satisfying . Let be i.i.d. Gaussian random vectors with mean and covariance matrix . Let be the sequence generated by Algorithm 1 with the input measured data , , the step size , and an initial guess generated by (Init1) and (Init2). Then, there exists universal positive constants and a universal constant such that: If
then

With probability at least ,

With probability at least ,
where is the smallest nonzero entry in magnitude of .
We see that the sampling complexity is for a recovery guarantee, same as most of existing sparse phase retrieval algorithms and bottlenecked by the initialization. When is small compared to , this sampling complexity is better than that in general phase retrieval. Therefore, using the sparsity of the underlying signal improves the sampling complexity. When is relatively large, we may use a spectral initization for general dense phase retrieval whose sampling complexity can be as good as . Thus, combined with Theorem 1, the sampling complexity of our algorithm is at most . In the extremely case when or even , it is still beneficial to use Algorithm 1 with, e.g., truncated spectral initialization [11]. On the one hand, the sampling complexity is optimal (i.e., ) by Theorem 1 and results on the truncated spectral initialization [11]. On the other hand, due to the finitestep convergence property, Algorithm 1 gives an exact recovery in steps if ; then our algorithm will give an exact recovery in operations, and this computational complexity is almost optimal because solving exactly a linear least squares problem of size also needs . Therefore, we obtain an algorithm with both optimal sampling and computational complexities.
4 Numerical results and discussions
In this section, we present some numerical results of Algorithm 1 and compare it with other existing sparse phase retrieval algorithms.
Throughout the numerical simulation, the target true signal is set to be sparse whose support are uniformly drawn from all subsets of at random. The nonzero entries of are generated as randn in the Matlab. The sampling vectors are i.i.d. random Gaussian vectors with mean and covariance matrix . The clean measured data is with for . The observed data is a noisy version of the clean data defined by
where in the noise are i.i.d. standard Gaussian, and
is the standard deviation of the noise. Thus the noise level is determined by
.Our algorithm HTP will be compared with some of the most recent and popular algorithms, including CoPRAM[23], Thresholded Wirtinger Flow(ThWF)[6] and SPArse truncated Amplitude flow (SPARTA)[40], in terms of efficiency and sampling complexity. In all the experiments, the step size for HTP is fixed to be . For SPARTA, the parameters are set to be . The numerical simulation are run on a laptop with GHz quadcore iHQ processor and GB memory using Matlab a.The relative error between the true signal and its estimation is defined by
In the experiments, a successful recovery is defined to be in case where is the output of an algorithm. Also, in this section is the logarithm of to the base (which is named log relative error in the description of the figures).
Number of iterations and finitestep convergence.
We test the number of iterations required for the proposed HTP algorithm, by the following three experiments. In the first experiment, we fix the sparsity and the sample size , and let the signal dimension vary as . The relative error are obtained by averaging the result of independent trial runs, and the experimental results are plotted in Figure 2. We see that the number of iterations required by our HTP algorithm is very few. More interestingly, we see clearly that the relative error suddenly jumps to almost after very few iterations, suggesting that our HTP algorithm enjoys an exact recovery in finite steps as predicted by Part (b) of Theorem 1. Furthermore, it seems that the number of iterations for the exact recovery depends barely on the signal dimension. In the second experiment, we fix the signal dimension and the number of samples , and let the sparsity vary . Table 1 shows the average maximum number of iteration required for convergence in independent trial run. We see that the number of iterations needed grows very slowly. Recall Part (b) of Theorem 1 states that the number of iteration required is . Since the number of is fixed, this slow growth might due to the growth of according to our random model generating . In the third experiment, we demonstrate the effect of the sample size on the number of iterations required. We fix the underlying signal dimension and sparsity , and let vary from to at every . The results are given in Table 2. We see that the number of iterations even decays very slowly with respect to , suggesting the dependency on in Part (b) in Theorem 1 is somewhat conservative.
Sparsity  10  20  30  40  50  60  70  80  90  100 
Max iteration no.  5  6  6  6  7  7  7  7  8  8 
Sample size  2000  3000  4000  5000  6000  7000  8000  9000  10000 
Max iteration no.  8  7  6  6  6  6  6  6  6 
Running time comparison.
We compare our HTP algorithm with existing sparse phase retrieval algorithms, including CoPRAM, HTP, ThWF and SPARTA, in terms of running time. The signal dimension is fixed to be . The comparison are demonstrated by two experiments. In the first experiment, the sparsity is set to be , the sample size is fixed to be . The noise level is set to be respectively. We plot in Figure 3 the results of averages trial runs with those fail trials ignored. In the second experiment, the sparsity is set to be respectively. In Figure 4, we plot the running time required for a successful signal recovery (in the sense of ). All the mean value are obtained by averaging independent trial runs with those failing trials filtered out. From both figures, we see that our proposed HTP algorithm is the fastest among all tested algorithms, and it is at least 2 times faster than other algorithms to achieve a recovery of relative error . Furthermore, the running time of our HTP algorithm grows the least with respect to among all tested algorithms.
Robustness to noise.
We show the effect of noise on the recovery error. In this experiment, we choose , ,
, and we test our HTP algorithm under different noise level of the observed measured data. We plot the mean relative error by our algorithm against the signaltonoise ratios of the measured data in Figure
5. The mean relative error are obtained by averaging independent trial runs. We see that the log relative error decays almost linearly with respect to the SNR in dB, suggesting that the recovery error of the HTP algorithm is controlled by a constant times the noise level in . Therefore, our proposed algorithm is robust to noise contained in the observed phaseless measured data.Phase transition.
Now we show the phase transition of our HTP algorithm and compare it with other algorithms. In this experiment, we fix the signal dimension
. Figure 6 depicts the success rate of different algorithms with different sparsities and different sample sizes : the sparsity shown in the axis vary from to with grid size , and sample size shown in the axis vary from to with grid size . In the figure, the grey level of a block means the successful recovery rate: black means successful reconstruction, white means successful reconstruction, and grey means a successful reconstruction rate between and . The successful recovery rates are obtained by independent trial runs. From the figure, we see that our HTP algorithm, CoPRAM, and SPARTA have similar phase transitions while all algorithms are comparable. For smaller , HTP, CoPRAM, and SPARTA are slightly better than ThWF. For larger , ThWF is slightly better than the others.5 Proofs
In this section, we prove Theorem 1. We first present some key lemmas in Section 5.1. Then the proof for Parts (a) and (b) of Theorem 1 are given in Sections 5.2 and 5.3 respectively.
We prove only the case when , so that . We will then estimate , which is an upper bound of . It can be done similarly for the case when by estimating .
5.1 Key Lemmas
In this subsection, we give and prove some key lemmas that will be used in the proof of Theorem 1.
We first present two probabilistic lemmas. The first probabilistic lemma is well known in compressed sensing theory [15, 10], which states that the random Gaussian matrix satisfies the restricted isometry property (RIP) if is sufficiently large.
Proposition 1 ([15, Theorem 9.27]).
With the help of RIP, we can bound the spectral norm of submatrices of . The following result is from Proposition 3.1 in [30].
Proposition 2 ([30, Proposition 3.1]).
Under the event (13) with and , for any disjoint subsets and of satisfying and , we have the following inequalities:
(14a)  
(14b)  
(14c) 
The second probabilistic lemma is a corollary of [32, Lemma 7.17]. Its proof can be obtained straightforwardly by modifying the proof of [32, Lemma 7.17], and similar modifications can also be found and are used in [23, 40].
Lemma 2 (Corollary of [32, Lemma 7.17]).
Let be i.i.d. Gaussian random vectors with mean and variance matrix . For some universal positive constants , if
then it holds that
(15) 
with probability at least . Here, can be any constant.
Proof.
See the last inequality in Page in the proof of [32, Lemma 7.17]. ∎
With those probabilistic lemmas, we can show some deterministic lemmas that are crucial to the proof of our theorem. The following lemma bound an error on by the error of .
Lemma 3.
Proof.
The last key lemma estimate the error of the vector obtained by one iteration of IHT. Its proof uses a similar strategy to the proof of in [40, Lemma 3]. To make the paper selfcontained, we have included the details of the proof.
Lemma 4.
Proof.
Define , , and
Since is the best term approximation of , we have
which together with and implies
Then, by the triangle inequality and the inequality above, we obtain