1 Introduction
In this paper, we are interested in imaging problems formulated as
(1) 
so the data vector
is a linear transformation of the unknown vector
that represents the image. The model matrix , which is given to us, depends on the geometry of the imaging system and on the sought resolution. Typically, the linear system (1) is underdetermined because only a few linear measurements are gathered, so . Hence, there exist infinitely many solutions to (1) and, thus, it is a priori not possible to find the correct one without some additional information.We are interested, however, in imaging problems with sparse scenes. We seek to locate the positions and amplitudes of a small number of point sources that illuminates a linear array of detectors. This means that the unknown vector is Msparse, with only a few nonzero entries. Under this assumption, (1) falls under the compressive sensing framework [21, 16, 22, 8]. It follows from [16] that the unique Msparse solution of (1) can be obtained with norm minimization when the model matrix is incoherent, i.e., when its mutual coherence^{1}^{1}1The mutual coherence of is defined as , where the column vectors of are normalized to one, so that . is smaller than . The same result can be obtained assuming obeys the Mrestricted isometry property [8], which basically states that all sets of Mcolumns of behave approximately as an orthonormal system.
In our imaging problems these incoherence conditions can be satisfied only for coarse image discretizations that imply poor resolution. To retain resolution and recover the position of the sources with higher precision we propose to extend the theory so as to allow for some coherence in . To this end, we show that uniqueness for the minimal norm solution of (1) can be obtained under less restrictive conditions on the model matrix . More specifically, given the columns of that correspond to the support of , we define their vicinities as the sets of columns that are almost parallel ^{2}^{2}2The vicinity of a column is defined as the set of all columns such that . to them. With this definition, our first result set out in Proposition 1 states that if the sources are located far enough from each other, so that their vicinities do not overlap, we can recover their positions exactly with noise free data. Furthermore, in the presence of small noise, their position is still approximately recoverable, in the sense that most of the solution vector is supported in the vicinities while some small noise (grass) is present away from them.
This result finds interesting applications in imaging. As we explain in Section 2, in array imaging we seek to find the position of point sources that are represented as the nonzero entries of
. Our result states under what conditions the location of these objects can be determined with high precision. It can be also used to explain superresolution, i.e., the significantly superior resolution that
norm minimization provides compared to the conventional resolution of the imaging system, i.e., the Rayleigh resolution. For instance, superresolution have been studied using sparsity promotion for sparse spike trains recovery from bandlimited measurements. Donoho [15] showed that spike locations and their weights can be exactly recovered for a cutoff frequency if the minimum spacing between spikes is large enough, so . Candès and FernandezGranda [11] showed that norm minimization guarantees the exact recovery if . Superresolution has also been studied for highly coherent model matrices that arise in imaging under the assumption of well separated objects when the resolution is below the Rayleigh threshold [19, 3, 4]. These works include results regarding the robustness of superresolution in the presence of noise.Our theory also addresses the robustness to noise of the minimal norm solution. Specifically, we show that for noisy data the solution can be separated into two parts: (1) the coherent part which is supported inside the vicinities, and (2) the incoherent part, usually referred to as grass, that is small and it is present everywhere. A key observation of our work is that the images get worse as when there is noise in the data and, thus, norm minimization fails when the number of measurements is large. This basically follows from (13) in Proposition 1 which relates the norm of the solution to the norm of the data, so
The key quantity here is the constant , which for usual imaging matrices is proportional to .
To overcome this problem we introduce in Proposition 2 the noise collector matrix and propose to solve instead the augmented linear system . The dimension of the unknown vector is, thus, augmented by components which do not have any physical meaning. They correspond to a fictitious sources that allows us to better approximate the noisy data. The natural question is how to build the noise collector matrix. Theoretically, the answer is given in the proof of Proposition 2 in Section 3, which is constructive. The key is that the column vectors of form now a frame in which the noisy vector can be well approximated. As a consequence, we obtain a bound on the constant () which is now independent of . The drawback of this construction is that we need exponentially many vectors, that is . This would suggest that the noise collector may not be practical. However, the numerical experiments show that with a large enough number of columns in
selected at random (as i.i.d. Gaussian random variables with mean zero and variance
) the norm minimization problem is regularized and the minimal norm solution is found.The paper is organized as follows. In Section 2, we formulate the array imaging problem. In Section 3, we present in a abstract linear algebra framework the conditions under which minimization provides the exact solution to problem (1) with and without noise. This section contains our main results. In Section 4, we illustrate with numerical simulations how our abstract theoretical results are relevant in imaging sparse sources with noisy data. Section 5 contains our conclusions.
2 Passive array imaging
We consider point sources located inside a region of interest called the image window IW. The goal of array imaging is to determine their positions and amplitudes using measurements obtained on an array of receivers. The array of size has receivers separated by a distance located at positions , (see Figure 1). They can measure single or multifrequency signals with frequencies , . The point sources, whose positions and complexvalued amplitudes , , we seek to determine, are at a distance from the array. The ambient medium between the array and the sources can be homogeneous or inhomogeneous.
In order to form the images we discretize the IW using a uniform grid of points , , and we introduce the true source vector
such that
We will not assume that the sources lie on the grid, i.e., typically for all and . To write the data received on the array in a compact form, we define the Green’s function vector
(2) 
at location in the IW, where denotes the freespace Green’s function of the homogeneous medium. This function characterizes the propagation of a signal of angular frequency from point to point , so (2) represents the signal received at the array due to a point source of amplitude one, phase zero, and frequency at . If the medium is homogeneous
(3) 
The signal received at at frequency is given by
(4) 
If we normalize the columns of to one and stack the data in a column vector
(5) 
then the source vector solves the system , with the matrix
(6) 
The system relates the unknown vector to the data vector . This system of linear equations can be solved by appropriate and methods.
Remark 1.
For simplicity of the presentation, we restricted ourselves to the passive array imaging problem where we seek to determine a distribution of sources. The active array imaging problem can be cast under the same linear algebra framework assuming the linearized Born approximation for scattering [12]. In that case, we still obtain a system of the form , where is the reflectivity of the scatterers, is the data, and is a model matrix for the scattering problem defined in a similar manner to (6). Even more, when multiple scattering is not negligible the problem can also be cast as in (1); see [13] for details. Therefore, the theory presented in the next sections can be applied to the scattering problems provided that the matrix satisfies the assumptions of Propositions 1 and 2.
3 minimizationbased methods
In the imaging problems considered here we assume that the sources occupy only a small fraction of the image window IW. This means that the true source vector is sparse, so the number of its entries that are different than zero, denoted by , is much smaller than its length . Thus, we assume . This prior knowledge changes the imaging problem substantially because we can exploit the sparsity of by formulating it as an optimization problem which seeks the sparsest vector in that equates model and data. Thus, for a single measurement vector we solve
(7) 
Above, and in the sequel, we denote by , and the , and norms of a vector, respectively.
The minimization problem (7) can be solved efficiently in practice. Several methods have been proposed in the literature to solve it. Here are some of them: orthogonal matching pursuit [7], homotopy [29, 28, 18], interiorpoint methods [1, 33], gradient projection [20], subgradient descent methods in primal and dual spaces [25, 6], and proximal gradient in combination with iterative shrinkagethresholding [26, 27, 2]. In this work we use GeLMA [24], a semiimplicit version of the primaldual method [14] that converges to the solution of the following problem independently of the regularization parameter : define the function
(8) 
for and , and determine the solution as
In the literature of compressive sensing we find the following theoretical justification of the norm minimization approach. If we assume decoherence of the columns of , so
(9) 
then the sparse solution of is unique, and it can be found as the solution of (7) [21, 16, 22]. Numerically, the norm minimization approach works under less restrictive conditions than the decoherence condition (9) suggests. In fact, our imaging matrices almost never satisfy (9).
Consider a typical imaging regime with central wavelength . Assume we use equally spaced frequencies covering a bandwidth that is of the central frequency. The size of the array is and the distance between the array and the IW is . An IW of size is is discretized using a uniform grid with mesh size . For such parameters, every column vector has at least sixty two other column vectors so that . Thus, our matrices are fairly far from satisfying the decoherence condition (9) if we want to recover, say, 8 sources. Numerically, however, the minimization works flawlessly.
Physically, a pair of columns and are coherent, so , if the corresponding gridpoints in the image are close to each other. In other words, when lies in a vicinity of (and vice versa). We assume, though, that the sources are far apart and, thus, the the set of columns indexed by the support of the true source vector does satisfy the the decoherence condition (9). The above observation motivates the following natural conjecture. Perhaps, the minimization works well because it suffices to satisfy (9) only on the support of . Our main result supports this conjecture.
3.1 Main results
When data is perturbed by small noise, the following qualitative description of the image could be observed. Firstly, some pixels close to the points where the sources are located become visible. Secondly, a few pixels away from the sources are also visible. The latter is usually referred as grass. In order to quantify the observed results we need to modify the the decoherence condition (9) and introduce the vicinities.
Definition 1.
Let be an sparse solution of , with support ^{3}^{3}3Below and in the rest of the paper the notation means the th entry of the vector . In contrast, we use the notation to represent the th vector of a set of vectors.. For any define the corresponding vicinity of as
(10) 
For any vector its coherent misfit to is
(11) 
whereas its incoherent remainder with respect to is
(12) 
Proposition 1.
Let be an sparse solution of , and let be its support. Suppose the vicinities from Definition 1 do not overlap, and let be defined as
(13) 
Let be the minimal norm solution of the noisy problem
(14) 
with . Then,
(15) 
and
(16) 
If , and does not contain collinear vectors, we have exact recovery:
Proposition 1 is proved in A. As it follows from this proof, our pessimistic bound could be sharpened to the usual bound (9) found in the literature. We did not strive to obtain sharper results because it will make the proofs more technical and, more importantly, because the concept of vicinities describes well the observed phenomena in imaging with this bound.
When there is no noise so , Proposition 1 tells us that the Msparse solution of can be recovered exactly by solving the minimization problem under a less stringent condition than (9). Note that we allow for the columns of to be close to collinear. When there is noise so , this Proposition shows that if the data is not exact but it is known up to some bounded vector, the solution of the minimization problem (14) is close to the solution of the original (noiseless) problem in the following sense. The solution can be separated into two parts: the coherent part supported in the vicinities of the true solution, , and the incoherent part, which is small for low noise, and that is supported away from these vicinities. Other stability results can be found in [8, 9, 17, 30, 19, 3].
Let us now make some comments regarding the relevance of this result in imaging. Vicinities, as defined in (10), are related to the classical norm resolution theory. Indeed, recall Kirchhoff migration imaging given by the norm solution
(17) 
where is the conjugate transpose of . Since the resolution analysis of KM relies on studying the behaviour of the inner products . We know from classical resolution analysis [5] that the inner products are large for points that fall inside the support of the KM point spread function, whose size is in crossrange (parallel to the array) and in range (perpendicular to the array). Hence, we expect the size of the vicinities to be proportional to these classical resolution limits, with an appropriate scaling factor that is inversely proportional to the sparsity . This is illustrated with numerical simulations in Section 4 (right column of Figure 2).
Under this perspective, one could argue that Proposition 1 tells us the well known result that a good reconstruction can be obtained for well separated sources. Proposition 1, however, gives us more information, it provides an norm resolution theory for imaging: when vicinities do not overlap, there is a single nonzero element of the source associated within each vicinity. Permitting the columns of to be almost collinear inside the vicinities allows for a fine discretization inside the vicinities and therefore the source can be recovered with very high precision. Furthermore, recovery is exact for noiseless data.
The assumptions in Proposition 1 are sufficient conditions but not necessary. Our numerical simulations illustrate exact recovery in more challenging situations, where the vicinities are not well separated (central images in the second row of Figure 2).
For noisy data, Proposition 1 says that it is the concept of vicinities that provide the adequate framework to look at the error between the true solution and the one provided by the norm minimization approach. Specifically, the error is controlled by the coherent misfit (11) and the incoherent remainder (12), which are shown to be small when the noise is small in . This means that the reconstructed source is supported mainly in the vicinities of the true solution, , and the grass in the image is low, i.e., the part of the solution supported away from the vicinities is small.
Proposition 1 implies that a key to control the noise is the constant defined in (13). In general, we have . Indeed, let be the minimum norm solution of the problem such that its support has at most size . Let be the submatrix of that contains the columns that correspond to the nonzero entries of . Then, the minimum solution satisfies (by CauchySchwartz )
This means that the quality of the image deteriorates as the number of measurements . The remedy that we propose to this is to augment the imaging matrix with a “noise collector” as described in the following Proposition.
Proposition 2.
There exists a noise collector matrix , with , such that the columns of the augmented matrix satisfy ,
(18) 
(19) 
and there is a positive constant
(20) 
such that
(21) 
Proof.
Let , for . We will construct iteratively a sequence of vectors , , , such that for each
The iteration will terminate at a finite step, say, . At the termination step we will have that for any , there exists such that
(22) 
The finite time termination is a consequence of a volume growth estimate. Namely, if (
19) holds for all , then the points , are centers of nonoverlapping balls of radius . The radius is bounded below:Thus the iteration will terminate at a finite step. Furthermore, if then the number as the dimension , because .
Let us finally estimate in (20). Without loss of generality, we may assume . By our construction, there exists such that (22) holds. Thus we can choose and so that and satisfies . Using (22) inductively we can find a sequence , and a sequence , so that and the vectors satisfy . Therefore,
(23) 
and
(24) 
by the triangle inequality. ∎
Proposition 2 is an important result as it shows that the constant in (13) can be made independent of by augmenting the columns of the linear system with columns of a noise collector matrix . The columns of are required to be decoherent to the columns of (see (18)), and decoherent between them (see (19)). Recalling that the columns of for the imaging problem are Green’s vectors corresponding to points in the imaging window, we stress that the columns of do not admit a physical interpretation. They do not correspond to any points in the imaging window or elsewhere. Similarly, the last components of the augmented unknown vector in (21) do not have a physical meaning. They correspond to fictitious auxiliary unknowns that are introduced to regularize the norm minimization problem.
The drawback in this theory is that the size of the noise collector is exponential . This makes it impractical. Our numerical experiments, however, indicate great improvement in the performance of norm minization with when the columns of are selected at random (its entries are i.i.d. Gaussian random variables with mean zero and variance ). This works well for additive mean zero uncorrelated noise. For other types of noise, the idea is to construct a library that represents the values that the noise vector takes. It is the elements of this library that should be used as columns of the noise collector matrix . A different approach can be followed when the noise is sparse so its norm is small. Then, could be simply taken as the identity matrix . This approach has been proposed and analyzed in [23] and provides exact recovery for sparse noise vectors .
In the next section we present numerical results to illustrate the relevance of our theory in imaging sparse sources. We focus our attention in the case of additive mean zero uncorrelated noise which is not sparse. The results show a dramatic improvement using the noise collector.
4 Imaging results in the framework of propositions 1 and 2
We illustrate here the relevance of Propositions 1 and 2 in imaging. We compare the solution obtained with the norm minimization algorithm GeLMA [24], and the norm Kirchhoff migration solution (17). Our results illustrate:

The wellknown superresolution for , meaning that determines the support of the unknown with higher accuracy than the conventional resolution limits, provided the assumptions of Proposition 1 are satisfied.

The equally well known sensitivity of to additive noise. This is made more precise in the imaging context where the constant in (13) grows with the number of measurements as , where is the total number of measurements acquired by receivers at frequencies. We observe that, for a given level of noise, the norm reconstruction deteriorates as the number of measurements increases.

The noise collector matrix stabilizes norm minimization in the presence of noise.
We also show how the bandwidth, the array size, and the number of sources affect the vicinities defined in (10). The numerical results are not specialized to a particular physical regime. They illustrate only the role of the Propositions 1 and 2 in solving the associated linear systems.
Imaging setup
The images are obtained in a homogeneous medium with an active array of transducers. We collect measurements corresponding to frequencies equispaced in the bandwidth. Thus, the length of the data vector is . The ratios between the array size and the distance to IW, and between the bandwidth and the central frequency vary in the numerical experiments, so the classical Rayleigh resolution limits change. The size of the IW is fixed. It is discretized using a uniform grid of points of size in range and crossrange directions.
The images have been formed by solving the norm minimization problem (7) using the algorithm GeLMA in [24]. GeLMA is an iterative shrinkagethresholding algorithm that provides the exact solution, for noiseless data, independently of the value of the parameter (see (8)) used to promote the sparsity of the images.
Results for noiseless data. Superresolution and reconstructions
Figure 2 shows the results obtained for a relatively large array and a relatively large bandwidth corresponding to ratios and when the data is noiseless. From left to right we show the solution (17), the solution obtained from (7), the comparison between (red stars) and the true solution (green circles), and the vicinities defined in (10) plotted with different colors. The top and bottom rows show images with and sources, respectively. The exact locations of the sources are indicated with white crosses in the two leftmost columns. The sources in the top row are very far apart: their vicinities do not overlap as it can be seen in the top right image. In this case, all the conditions of Proposition 1 are satisfied and we find the exact source distribution by norm minimization. The sources in the bottom row are closer, and their vicinities are larger; according to (10) the size of the vicinities increases with . In fact, their vicinities overlap as it can be seen in the bottom right image. Still, the norm minimization algorithm finds the exact solution.
The classical resolution limits for this setup are in range and in crossrange. This means that the resolution of the norm solutions is of the order ; see the left column of Figure 2. Recall that our discretization is , that is four times finer than the classical resolution limit. Thus, each source roughly corresponds to a four by four pixel square, which is what the solutions show. Note that for , because two sources are quite close, the solution only displays sources. The ability of norm minimization to determine the location of the sources with a better accuracy than the classical resolution limits is referred to as superresolution.
We stress that if the IW is discretized using a very fine grid, with grid size smaller than the classical resolution limit, then the columns of the matrix are almost parallel and the decoherence condition (9) is violated. The columns that are almost parallel to those indexed by the support of the true solution are contained in the vicinities (10). The number of columns that belong to the vicinities depend on the imaging system. To illustrate the effect of the array and bandwidth sizes on the size of the vicinities we plot in Figure 3 the vicinity of one source for . From left to right we plot the vicinities for , , , and . As expected, the size of the vicinity is proportional to the resolution estimates and in crossrange and range, respectively.
Results for noisy data. Stabilization of norm minimization using the noise collector matrix
We add now mean zero uncorrelated noise to the data. We examine the results for different values of the signal to noise ratio (SNR). We consider first the same imaging configuration as in Figure 2 with sources. The number of data is and the number of unknowns is . In the top row of Figure 4 we plot the minimal norm image obtained by solving problem (7) when the SNR is dB. The true solution is shown with white crosses. It is apparent that, even for this moderate level of noise, norm minimization fails to give a good image.
The problem can be alleviated using the noise collector matrix , as it can be seen in the results shown in the bottom row of Figure 4. To construct the noise collector matrix that verifies the assumptions of Proposition 2, we take its columns to be random vectors in with mean zero and variance . Their norm tends to one as , and we check that conditions (18) and (19) are satisfied. In theory, the number of columns should be very large, of the order of , but in practice, we obtain stable results with of the order of , which is roughly .
The solution obtained with the noise collector can be decomposed into two vectors; the vector corresponding to the sought solution in the , and the vector that absorbs the noise. We display these two vectors in the bottom right plot of Figure 4. The first components correspond to and the remaining components to . It is remarkable that the vector is very close to the true solution and that it contains only some small grass. This means that both the coherent misfit (15) and the incoherent remainder (16) are now small. This is in accordance with the theoretical error estimates (15) and (16), where is now independent of the dimension of the data vector ; see (20).
In the next figures, we consider an imaging setup with a large aperture and a large bandwidth . Moreover, we increase the pixel size to in both range and crossrange directions, so the Rayleigh resolution is of the order of a pixel. With this imaging configuration, the columns of the model matrix are less coherent than in the previous numerical experiments. We plot in Figure 5 the norm image for a SNR dB. With a less coherent matrix the results are very good. This highlights the inherent difficulty in imaging when high resolution is required as in Figure 4 .
For the particular low imaging resolution configuration considered in Figure 5 we obtain good results for a large noise level corresponding to SNR dB; see the top row of Figure 6 where as before. However, when we increase the number of measurements to , the image obtained with norm minimization turns out to be useless; see the bottom row of Figure 6 . This illustrates the counterintuitive fact that norm minimization does not always benefit from more data, at less if the data is highly contaminated with noise. This is so because the constant in (13) depends on the length of the data vector as .
As before, this problem can be fixed with the noise collector as it can be seen in Figure 7. Again, the noise is effectively absorbed for both (top row) and (bottom row) measurements using a matrix collector with a relatively small number of columns, many less than as Proposition 2 suggests.
We finish with one last example that shows that the use of the noise collector makes norm minimization competitive for imaging sparse scenes because it provides stable results with superresolution even for highly corrupted data. We consider the example with sources and SNR dB. The array and the bandwidth are relatively large (, ), so the classical norm resolution is of the order , as in Figure (4). In Figure 8 we show, from left to right, (i) the minimal norm solution without noise collector, which fails to give a good image, (ii) the norm solution (17), which is stable to additive noise but does not resolve nearby sources, and (iii) the minimal norm solution with the noise collector, which provides a very precise and stable image.
5 Discussion
In this paper, we consider imaging problems that can be formulated as underdetermined linear systems of the form , where is an model matrix with , and is the dimensional data vector contaminated with noise. We assume that the solution is an Msparse vector in , corresponding to the pixels of the IW. We consider additive noise in the data, so the data vector can be decomposed as , where is the data vector in the absence of noise and is the noise vector. We provide a theoretical framework that allows us to examine under what conditions the minimization problem admits a solution that is close to the exact one. We also shown that, for our imaging problems, minimization fails when the noise level is high and the dimension of the data vector increases. The reason is that the error is proportional to the square root of .
To alleviate this problem and increase the robustness of minimization, we propose a regularization strategy. In particular, we seek the solution of , where the matrix is a noise collector. Thus, the unknown is now a vector in . The first components of the unknown correspond to the distribution of sources in the IW, while the next components do not correspond to any physical quantity. They are introduced to provide a fictitious source distribution given by an appropriate linear combination of the columns of that produces a good approximation to . The main idea is to create a library of noises. The columns of the noise collector matrix are elements of this library and they are constructed so as to be incoherent with respect to the columns of . Theoretically, the dimension of the noise collector increases exponentially with , which suggests that it may not be useful in practice. Our numerical results show, however, robustness for minimization in the presence of noise when a large enough number of columns is used to build the noise collector matrix.
Our first findings on the noise collector are very encouraging. We have shown that its use improves dramatically the robustness of norm reconstructions when the data are corrupted with additive uncorrelated noise. Many other questions ought to be addressed. Some directions of our future research concern the following aspects: what happens with other types of noise?, can we design noise collectors adaptively depending on the noise in the data?, what if the noise comes from wave propagation in a random medium?, can we design a noise collector for this case?, how much do we need to know about the noise so as to design a good noise collector?, can we retrieve this information from the data? Some of these questions will be addressed somewhere else.
Acknowledgments
Part of this material is based upon work supported by the National Science Foundation under Grant No. DMS1439786 while the authors were in residence at the Institute for Computational and Experimental Research in Mathematics (ICERM) in Providence, RI, during the Fall 2017 semester. The work of M. Moscoso was partially supported by Spanish MICINN grant FIS201677892R. The work of A.Novikov was partially supported by NSF grants DMS1515187, DMS1813943. The work of C. Tsogka was partially supported by AFOSR FA95501710238.
References
References

[1]
F. Alizadeh,
Interior point methods in semidefinite programming with applications to combinatorial optimization
, SIAM J. Optim. 5 (1995), pp. 13–51.  [2] Beck, Amir, and Teboulle, Marc, A Fast Iterative ShrinkageThresholding Algorithm for Linear Inverse Problems, SIAM J. Img. Sci. 2 (2009), pp.183–202.
 [3] L. Borcea and I. Kocyigit, Resolution analysis of imaging with optimization, SIAM J. Imaging Sci. 8 (2015), pp. 3015–3050.
 [4] L. Borcea and I. Kocyigit, A multiple measurement vector approach to synthetic aperture radar imaging, SIAM J. Imaging Sci. 11 (2018), pp. 770–801.
 [5] L. Borcea and G. Papanicolaou and C. Tsogka, A resolution study for imaging and time reversal in random media, Contemporary Math. 333 (2003), pp. 63–77.
 [6] Borwein, Jonathan M. and Luke, D. Russell, Entropic regularization of the function, Springer Optim. Appl. 49 (2011), Springer, pp. 65–92.
 [7] Bruckstein, Alfred M., Donoho, David L., Elad, Michael, From sparse solutions of systems of equations to sparse modeling of signals and images, SIAM Rev. 51 (2009), pp.34–81.
 [8] E.J Candès. J. K. Romberg, and T. Tao, Stable signal recovery from incomplete and inaccurate information, Communications on Pure and Applied Mathematics 59 (2006), pp. 1207–33.
 [9] E. J Candès and T. Tao, Near optimal signal recovery from random projections: universal encoding strategies?, IEEE Trans. Inf. Theory 52 (2006), pp. 5406–25.
 [10] E. J. Candès, Y. C. Eldar, T. Strohmer, and V. Voroninski, Phase Retrieval via Matrix Completion, SIAM J. Imaging Sci. 6 (2013), pp. 199–225.
 [11] E. J. Candès and C. FernandezGranda, Towards a mathematical theory of superresolution, Communications on Pure and Applied Mathematics 67 (2014), pp. 906956.

[12]
A. Chai, M. Moscoso and G. Papanicolaou,
Robust imaging of localized scatterers using the singular value decomposition and
optimization, Inverse Problems 29 (2013), 025016.  [13] A. Chai, M. Moscoso and G. Papanicolaou, Imaging strong localized scatterers with sparsity promoting optimization, SIAM J. Imaging Sci. 10 (2014), pp. 1358–1387.
 [14] A. Chambolle and T. Pock, A firstorder primaldual algorithm for convex problems with applications to imaging, Journal of Mathematical Imaging and Vision 40 (2011), pp 120–145.
 [15] D. Donoho, Superresolution via sparsity constraint, SIAM Journal on Mathematical Analysis 23 (1992), pp. 1303–1331.
 [16] D. Donoho and M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via minimization, Proceedings of the National Academy of Sciences 100 (2003), pp. 2197–2202.
 [17] D. Donoho, M. Elad and V. Temlyakov, Stable recovery of sparse overcomplete representations in the presence of noise, IEEE Trans. Information Theory 52 (2006), pp. 6–18.
 [18] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Least angle regression, Annals of Statistics 32 (2004), pp. 407–499.
 [19] A. Fannjiang and W. Liao, Coherence patternguided compressive sensing with unresolved grids, SIAM J. Imaging Sci. 5 (2012), pp. 179–202.
 [20] M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright, Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems, IEEE Journal of Selected Topics in Signal Processing 1 (2007), pp.586–597.
 [21] I. F. Gorodnitsky, and B. D. Rao, Sparse Signal Reconstruction from Limited Data Using FOCUSS: A Reweighted Minimum Norm Algorithm, Trans. Sig. Proc. 45 (1997), pp. 600–616.
 [22] R. Gribonval and M. Nielsen, Sparse representations in unions of bases, IEEE Transactions on Information Theory 49 (2003), pp. 3320–3325.
 [23] J. N. Laska, M. A. Davenport and R. G. Baraniuk, Exact signal recovery from sparsely corrupted measurements through the Pursuit of Justice, 2009 Conference Record of the FortyThird Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, 2009, pp. 1556–1560.
 [24] M. Moscoso, A. Novikov, G. Papanicolaou and L. Ryzhik, A differential equations approach to l1minimization with applications to array imaging, Inverse Problems 28 (2012).
 [25] A. Nedić, and D. P. Bertsekas, Incremental subgradient methods for nondifferentiable optimization, SIAM J. Optim. 12 (2001), pp.109–138.
 [26] Y. Nesterov, A method of solving a convex programming problem with convergence rate , Soviet Mathematics Doklady 27 (1983), pp. 372–376.
 [27] Y. Nesterov, Gradient methods for minimizing composite objective function, Math. Program., Ser. B 140 (2013), pp. 125–161.
 [28] M. R. Osborne, B. Presnell, and B.A. Turlach, A new approach to variable selection in least squares problems, IMA Journal of Numerical Analysis 20 (2000), pp. 389403.
 [29] R. Tibshirani, Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society, Series B 58 (1996), pp. 267–288.
 [30] J. Tropp, Just relax: Convex programming methods for identifying sparse signals in noise, IEEE Trans. Information Theory 52 (2006), pp. 1030–1051.
 [31] J. Tropp, A Gilbert, and M. Strauss, Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit, Signal Processing 86 (2006), pp. 572–588.
 [32] J. Tropp, Algorithms for simultaneous sparse approximation. Part II: Convex relaxation, Signal Processing 86 (2006), pp. 589–602.
 [33] M. H. Wright, The interiorpoint revolution in optimization: history, recent developments, and lasting consequences, Bull. Amer. Math. Soc. (N.S) 42 (2005), pp.39–56.
Appendix A Proof of Proposition 1
We will now prove auxiliary lemmas that we will use in the proof of Propostion 1.
Lemma 1.
Lemma 2.
Suppose is defined by (13). Let and be minimizers of , subject to and , respectively. Then, there exists such that ,
(26) 
and
(27) 
Proof.
Let us first show that
(28) 
Assume, for definiteness, that . Then,
Suppose is a minimizer of , subject to . Since , and is a minimizer of , subject to , it follows . By (13) and the triangle inequality