The Noise Collector
The construction of the Noise Collector matrix starts with the following three key properties. Firstly, its columns should be sufficiently orthogonal to the columns of , so it does not absorb signals with ”meaningful” information. Secondly, the columns of should be uniformly distributed on the unit sphere so that we could approximate well a typical noise vector. Thirdly, the number of columns in should grow slower than exponential with , otherwise the method is impractical. One way to guarantee all three properties is to impose
(5) 
with , and fill out drawing at random with rejections until the rejection rate becomes too high. Then, by construction, the columns of are almost orthogonal to the columns of , and when the rejection rate becomes too high this implies that we can not pack more Ndimensional unit vectors into and, thus, we can approximate well a typical noise vector. Finally, the KabatjanskiiLevenstein inequality (see discussion in [25]) implies that the number of columns in grows at most polynomially: .
It is, however, more convenient for the proofs to use a probabilistic version of [5]. Suppose that the columns of are drawn at random independently. Then, the dot product of any two random unit vectors is still typically of order , see e.g. [30]. If the number of columns grows polynomially, we only have to sacrifice an asymptotically negligible event where our Noise Collector does not satisfy the three key properties, and the decoherence constraints in [5] are weakened by a logarithmic factor. The next Lemma is proved in Section Proofs.
Lemma 1: Suppose , , vectors are drawn at random and independently. Then, (i) for any there are constants and , such that
(6) 
and (ii) for any there exists at least one , so that
(7) 
with probability .
The estimate in [6] implies that any solution satisfies, for any ,
(8) 
with probability . This estimate measures how expensive it is to approximate columns of with the Noise Collector. In turn, the weight should be chosen so that it is expensive to approximate noise using columns of . It cannot be taken too large, though, because we may loose the signal. In fact, one can prove that if , then for any and any level of noise. Intuitively, the weight characterizes the rate at which the signal is lost as the noise increases.
To explain the theoretical lower bound
we turn to the geometric interpretation of duality in linear programming. Suppose
and there is no signal, . Then, the solution of [4] satisfies , and there is a dual certificate of optimality of for that satisfiesDefine a nonlinear map
(9) 
where is the noise vector in [4], and is the dual certificate of optimality of for . For example, if
is the identity matrix, then
; see Figure 1left. If remains a dual certificate of optimality of for , then it implies that support support for such . Thus, Theorem 1 follows once we check that(10) 
holds with large probability. Thus, we need to understand the statistics of , given that is uniformly distributed on . The columns of the Noise Collector were also uniformly distributed on , thus the vector has to be uniformly distributed on as well. The chance (10) does not hold, could be estimated by the area of the intersection of the unit sphere and the ball of radius (see Figure 1right), which can be shown to be small by standard estimates from highdimensional probability.
By construction, the columns of the combined matrix are incoherent. This is the key observation, that allows us to prove Theorems 2 and 3 using standard techniques, see e.g. [20]. In particular, we automatically have exact recovery by the standard arguments [11] applied to if the data is noiseless.
Lemma 2 (Exact Recovery): Suppose is an sparse solution of , and there is no noise, . In addition, assume that the columns of are incoherent: . Then, the solution to [4] satisfies for all
(11) 
with probability .
Fast Noise Collector Algorithm
To find the minimizer [4], we consider a variational approach. We define the function
(12)  
for a weight , and determine the solution as
(13) 
The key observation is that this variational principle finds the minimum in [4] exactly for all values of the regularization parameter . Hence, the proposed method is fully automated, meaning that it has no tuning parameters. To determine the exact extremum in [13], we use the iterative soft thresholding algorithm GeLMA [21] that works as follows .
For we use in our numerical experiments. For optimal results, one can calibrate to be the smallest constant such that Theorem 1 holds, that is, we see no phantom signals when the algorithm is fed with pure noise.
Pick a value for the regularization parameter , e.g. . Choose step sizes and ^{5}^{5}5Choosing two step sizes instead of the smaller one improves the convergence speed.. Set , , , and iterate for :
(14) 
where .
The Noise Collector matrix is computed by drawing normally distributed dimensional vectors, normalized to unit length. These are the generating vectors of the Noise Collector. From each of them a circulant matrix , , is constructed. The Noise Collector matrix is obtained by concatenation, so . Exploiting the circulant structure of the matrices , we perform the matrix vector multiplications and in (14) using the FFT [15]. This makes the complexity associated to the Noise Collector . Note that only the generating vectors are stored, and not the entire Noise Collector matrix. In practice, we use which makes the cost of using the Noise Collector negligible, as typically .
Application to imaging
We consider passive array imaging of point sources. The problem consists in determining the positions and the complex^{6}^{6}6We chose to work with real numbers in the previous sections for ease of presentation but the results also hold with complex numbers. amplitudes , , of a few point sources from measurements of polychromatic signals on an array of receivers; see Figure 2. The imaging system is characterized by the array aperture , the distance to the sources, the bandwidth and the central wavelength .
The sources are located inside an image window IW, which is discretized with a uniform grid of points , . The unknown is the source vector , whose components correspond to the complex amplitudes of the sources at the grid points , , with . For the true source vector we have if for some , while otherwise.
Denoting by the Green’s function for the propagation of a signal of angular frequency from point to point , we define the singlefrequency Green’s function vector that connects a point in the IW with all points , , on the array as
In a homogeneous medium in three dimensions, .
The data for the imaging problem are the signals
(15) 
recorded at receiver locations , , at frequencies , . These data are stacked in a column vector
(16) 
with . Then, , with the measurement matrix whose columns are the multiplefrequency Green’s function vectors
(17) 
normalized to have length one. The system relates the unknown vector to the data vector .
Next, we illustrate the performance of the Noise Collector in this imaging setup. The most important features are that (i) no calibration is necessary with respect to the level of noise, (ii) exact support recovery for relatively large levels of noise (i.e., ), and (iii) zero false discovery rate for all levels of noise, with high probability.
We consider a high frequency microwave imaging regime with central frequency GHz corresponding to mm. We make measurements for equally spaced frequencies spanning a bandwidth GHz. The array has receivers and an aperture cm. The distance from the array to the center of the imaging window is cm. Then, the resolution is mm in the crossrange (direction parallel to the array) and mm in range (direction of propagation). These parameters are typical in microwave scanning technology [18].
We seek to image a source vector with sparsity ; see the left plot in Fig. 3. The size of the imaging window is 20cm60cm and the pixel spacing is 5mm15mm. The number of unknowns is, therefore, and the number of data is . The size of the noise collector is taken to be , so . When the data is noiseless, we obtain exact recovery as expected; see the right plot in Fig. 3
In Fig. 4, we display the imaging results, with and without a Noise Collector, when the data is corrupted by additive noise. The SNR , so the norms of the signals and the noise are equal. In the left plot, we show the recovered image using norm minimization without a Noise Collector. There is a lot of grass in this image, with many nonzero values outside the true support. When a Noise Collector is used, the level of the grass is reduced and the image improves; see the second from the left plot. Still, there are several false discoveries because we use in [14].
In the third column from the left of Fig. 4 we show the image obtained with a weight in [14]. With this weight, there are no false discoveries and the recovered support is exact. This simplifies the imaging problem dramatically, as we can now restrict the inverse problem to the true support just obtained, and then solve an overdetermined linear system using a classical approach. The results are shown in the right column of Fig. 4. Note that this second step largely compensates for the signal that was lost in the first step due to the high level of noise.
In Figure 5 we illustrate the performance of the Noise Collector for different sparsity levels and SNR values. Success in recovering the true support of the unknown corresponds to the value (yellow) and failure to
(blue). The small phase transition zone (green) contains intermediate values. These results are obtained by averaging over 5 realizations of noise.
no NC  with NC and  with NC and  on the support 

Remark 1: We considered passive array imaging for ease of presentation. Same results hold for active array imaging with or without multiple scattering; see [7] for the detailed analytical setup.
Remark 2: We have considered a microwave imaging regime. Similar results can be obtained in other regimes.
Proofs
Proof of Lemma 1: Denote the event
By independence, for any and . Thus, . Choosing for sufficiently large , we get
where and . Hence, [6] holds with large probability .
Next, we consider the chances that [7] does not hold. Suppose there is a direction such that
(18) 
holds for all . Let be the dimensional volume of a parallelogram spanned by , , . Note that is equal to times its height. Then, if (18) holds,
(19) 
for any choice of columns from the noise collector . If we fix the indices , , then, due to rotational invariance, the probability of the event (19) equals the probability of event . Using
and that we can find sets of distinct indices , , , we conclude that
Choosing sufficiently small, i.e. , and sufficiently large, we obtain the result.
Proof of Theorem 1: In order to check (10), we assume that both and are in , because it is more geometrically intuitive to work with the convex hull
(20) 
It implies we may also assume in (4) has nonnegative coefficients, and . Thus, is a norm of with respect to , and we can set . This norm is called atomic in [8]. Suppose is the support of . Its typical size . Then, the simplex
has the unique normal vector , which is collinear to our dual certificate , because
(21) 
The estimate (7) implies that the convex hull contains an ball of radius . Therefore, with large probability.
By construction, the distribution of is rotationally invariant with respect to the probability measure induced by all and . Thus is also uniformly distributed on , and
(22) 
for all , see e.g. [30]. Therefore, we can bound the probability that (10) does not hold:
for large and appropriately chosen . Hence, (10) holds with large probability .
Proof of Theorem 2: If the columns of are orthogonal, our previous arguments could be modified to verify Theorem 2. Indeed, suppose is the span of the column vectors , with in the support of . Say, is spanned by , , . Let be the orthogonal complement to . Then, the orthogonal projection of the signal . By the concentration of measure see e.g. [30], the projection of the noise is uniformly distributed on the unit sphere with large probability. Applying the previous arguments to , the projection of on , we conclude that the projection . Therefore, with large probability.
For general consider the orthogonal decomposition for all . As before, we can choose so that with large probability. It remains to demonstrate that . Fix any . Suppose , and . Thus,
Then, , so . Hence,
Proof of Theorem 3: It suffices to prove the result for 1sparse , say, . We will demonstrate that the solution to the minimization problem
(23)  
with , satisfies if , . This implies .
Suppose , and are the optimal solutions of
(24) 
with righthand sides , , and , respectively. Since , we have
and, therefore,
(25) 
respectively. Suppose , . Then, for any , and for large enough,
Using (25) with , we conclude that
for all . It implies (23).
Acknowledgements The work of M. Moscoso was partially supported by Spanish grant MICINN FIS201677892R. The work of A.Novikov was partially supported by NSF grants DMS1515187, DMS1813943. The work of G. Papanicolaou was partially supported by AFOSR FA95501810519. The work of C. Tsogka was partially supported by AFOSR FA95501710238 and FA95501810519. We thank Marguerite Novikov for drawing Figure 1.
References
 [1] M. AlQuraishi and H. H. McAdams, Direct inference of protein DNA interactions using compressed sensing methods, Proc. Natl. Acad. Sci. U.S.A 108,14819–14824 (2001).
 [2] R. Baraniuk and Philippe Steeghs, Compressive Radar Imaging, in 2007 IEEE Radar Conference, Apr. 2007, 128–133.
 [3] L. Borcea and I. Kocyigit, Resolution analysis of imaging with optimization, SIAM J. Imaging Sci. 8, 3015–3050 (2015).
 [4] E. J Candès and T. Tao, Decoding by linear programming, IEEE Trans. Inf. Theory 51, 4203–4215 (2005).
 [5] E. J. Candès and C. FernandezGranda, Towards a mathematical theory of superresolution, Comm. Pure Appl. Math. 67, 906956 (2014).

[6]
A. Chai, M. Moscoso and G. Papanicolaou, Robust imaging of localized scatterers using the singular value decomposition and
optimization, Inverse Problems 29, 025016 (2013).  [7] A. Chai, M. Moscoso and G. Papanicolaou, Imaging Strong Localized Scatterers with Sparsity Promoting Optimization, SIAM J. Imaging Sci. 7, 1358–1387 (2014).
 [8] V. Chandrasekaran, B.Recht, P. A. Parrilo, A. S. Willsky, The convex geometry of linear inverse problems, Found. Comput. Math. 12, 805–849 (2012).
 [9] S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic decomposition by basis pursuit, SIAM Rev. 43, 12–159 (2001).
 [10] D. L. Donoho, Superresolution via sparsity constraint, SIAM J Math Anal 23, 1303–1331 (1992).
 [11] D. L. Donoho and M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via minimization, Proc. Natl. Acad. Sci. U.S.A 100, 2197–2202 (2003).
 [12] A. C. Fannjiang, T. Strohmer, and P. Yan, Compressed remote sensing of sparse objects, SIAM J. Imag. Sci. 3, 595618 (2010).
 [13] A. C. Fannjiang and W. Liao, Coherence patternguided compressive sensing with unresolved grids, SIAM J. Imag. Sci. 5, 179–202 (2012).
 [14] J. J. Fuchs, Recovery of exact sparse representations in the presence of bounded noise, IEEE Trans. Inf. Theory 51, 3601–3608 (2005).
 [15] R. M. Gray, Toeplitz and Circulant Matrices: A Review, Foundations and Trends in Communications and Information Theory 2, 155–239 (2006).
 [16] M. A. Herman and T. Strohmer, HighResolution Radar via Compressed Sensing, IEEE Trans. Signal Process. 57, 22752284 (2009).
 [17] J. N. Laska, M. A. Davenport and R. G. Baraniuk, Exact signal recovery from sparsely corrupted measurements through the Pursuit of Justice, 2009 Conference Record of the FortyThird Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, 2009, 1556–1560.
 [18] J. Laviada, A. ArboleyaArboleya, Y. AlvarezLopez, C. GarciaGonzalez and F. LasHeras, Phaseless synthetic aperture radar with efficient sampling for broadband nearfield imaging: Theory and validation, IEEE Trans. Antennas Propag., 63:2 (2015), pp. 573–584.
 [19] D. Malioutov, M. Cetin, A.S. Willsky, A sparse signal reconstruction perspective for source localization with sensor arrays, IEEE Trans. Signal Process. 53, 3010–3022 (2005).
 [20] M. Moscoso, A. Novikov, G. Papanicolaou and C.Tsogka, Imaging with highly incomplete and corrupted data, submitted.
 [21] M. Moscoso, A. Novikov, G. Papanicolaou and L. Ryzhik, A differential equations approach to l1minimization with applications to array imaging, Inverse Problems 28 (2012).
 [22] J. Romberg, Imaging via Compressive Sampling, IEEE Signal Process. Mag. 25, 14–20 (2008).
 [23] J. N. Sampson, N. Chatterjee, R. J. Carroll, and S. Müller, Controlling the local false discovery rate in the adaptive Lasso, Biostatistics 14, 653–666 (2013).
 [24] H. L. Taylor, S. C. Banks, and J. F. McCoy, Deconvolution with the l 1 norm, Geophysics 44, 39–52 (1979).
 [25] T. Terrence, A cheap version of the KabatjanskiiLevenstein bound for almost orthogonal vectors, https://terrytao.wordpress.com/2013/07/18/acheapversionofthekabatjanskiilevensteinboundforalmostorthogonalvectors/
 [26] R. Tibshirani, Regression Shrinkage and Selection via the lasso, Journal of the Royal Statistical Society. Series B (methodological) 58, 267–288 (1996).
 [27] J. A. Tropp, Just Relax: Convex Programming Methods for Identifying Sparse Signals in Noise, IEEE Trans. Inf. Theory 52, 1030–1051 (2006).
 [28] J. A. Tropp, J. N. Laska, M. F. Duarte, J. K. Romberg, and R. G. Baraniuk, Beyond Nyquist: Efficient Sampling of Sparse Bandlimited Signals, IEEE Transactions on Information Theory 56, 520–544 (2010).
 [29] J. Trzasko and A. Manduca, Highly undersampled magnetic resonance image reconstruction via homotopic minimization, IEEE Trans. Med. Imag. 28, 106–121 (2009).

[30]
R. Vershynin, Highdimensional probability. An introduction with applications in data science, Cambridge University Press, 2018.
 [31] M. J. Wainwright, Sharp Thresholds for HighDimensional and Noisy Sparsity Recovery Using Constrained Quadratic Programming (Lasso), IEEE Trans. Inf. Theory 55, 2183–2202 (2009).
 [32] H. Zou, The Adaptive Lasso and Its Oracle Properties, J. Amer. Statist. Assoc. 101, 1418–1429 (2006).