The Noise Collector for sparse recovery in high dimensions

08/05/2019
by   Miguel Moscoso, et al.
4

The ability to detect sparse signals from noisy high-dimensional data is a top priority in modern science and engineering. A sparse solution of the linear system A ρ = b_0 can be found efficiently with an l_1-norm minimization approach if the data is noiseless. Detection of the signal's support from data corrupted by noise is still a challenging problem, especially if the level of noise must be estimated. We propose a new efficient approach that does not require any parameter estimation. We introduce the Noise Collector (NC) matrix C and solve an augmented system A ρ + C η = b_0 + e, where e is the noise. We show that the l_1-norm minimal solution of the augmented system has zero false discovery rate for any level of noise and with probability that tends to one as the dimension of b_0 increases to infinity. We also obtain exact support recovery if the noise is not too large, and develop a Fast Noise Collector Algorithm which makes the computational cost of solving the augmented system comparable to that of the original one. Finally, we demonstrate the effectiveness of the method in applications to passive array imaging.

READ FULL TEXT VIEW PDF

page 6

page 11

page 12

page 13

08/05/2019

Imaging with highly incomplete and corrupted data

We consider the problem of imaging sparse scenes from a few noisy data u...
03/17/2021

Thresholding Greedy Pursuit for Sparse Recovery Problems

We study here sparse recovery problems in the presence of additive noise...
10/11/2020

Fast signal recovery from quadratic measurements

We present a novel approach for recovering a sparse signal from cross-co...
10/29/2019

Support Recovery for Sparse Signals with Non-stationary Modulation

The problem of estimating a sparse signal from low dimensional noisy obs...
07/23/2018

Fast Two-Dimensional Atomic Norm Minimization in Spectrum Estimation and Denoising

Motivated by recent work on two dimensional (2D) harmonic component reco...
05/14/2018

Model selection with lasso-zero: adding straw to the haystack to better find needles

The high-dimensional linear model y = X β^0 + ϵ is considered and the fo...
01/20/2015

Separation of undersampled composite signals using the Dantzig selector with overcomplete dictionaries

In many applications one may acquire a composition of several signals th...

The Noise Collector

The construction of the Noise Collector matrix starts with the following three key properties. Firstly, its columns should be sufficiently orthogonal to the columns of , so it does not absorb signals with ”meaningful” information. Secondly, the columns of should be uniformly distributed on the unit sphere so that we could approximate well a typical noise vector. Thirdly, the number of columns in should grow slower than exponential with , otherwise the method is impractical. One way to guarantee all three properties is to impose

(5)

with , and fill out drawing at random with rejections until the rejection rate becomes too high. Then, by construction, the columns of are almost orthogonal to the columns of , and when the rejection rate becomes too high this implies that we can not pack more N-dimensional unit vectors into and, thus, we can approximate well a typical noise vector. Finally, the Kabatjanskii-Levenstein inequality (see discussion in [25]) implies that the number of columns in grows at most polynomially: .

It is, however, more convenient for the proofs to use a probabilistic version of [5]. Suppose that the columns of are drawn at random independently. Then, the dot product of any two random unit vectors is still typically of order , see e.g. [30]. If the number of columns grows polynomially, we only have to sacrifice an asymptotically negligible event where our Noise Collector does not satisfy the three key properties, and the decoherence constraints in [5] are weakened by a logarithmic factor. The next Lemma is proved in Section Proofs.

Lemma 1: Suppose , , vectors are drawn at random and independently. Then, (i) for any there are constants and , such that

(6)

and (ii) for any there exists at least one , so that

(7)

with probability .

The estimate in [6] implies that any solution satisfies, for any ,

(8)

with probability . This estimate measures how expensive it is to approximate columns of with the Noise Collector. In turn, the weight should be chosen so that it is expensive to approximate noise using columns of . It cannot be taken too large, though, because we may loose the signal. In fact, one can prove that if , then for any and any level of noise. Intuitively, the weight characterizes the rate at which the signal is lost as the noise increases.

Figure 1: Left: Illustration of the unit cube and the map . Right: V.Milman’s hyperbolic view of the ball of radius and .

To explain the theoretical lower bound

we turn to the geometric interpretation of duality in linear programming. Suppose

and there is no signal, . Then, the solution of [4] satisfies , and there is a dual certificate of optimality of for that satisfies

Define a nonlinear map

(9)

where is the noise vector in [4], and is the dual certificate of optimality of for . For example, if

is the identity matrix, then

; see Figure 1-left. If remains a dual certificate of optimality of for , then it implies that support support for such . Thus, Theorem 1 follows once we check that

(10)

holds with large probability. Thus, we need to understand the statistics of , given that is uniformly distributed on . The columns of the Noise Collector were also uniformly distributed on , thus the vector has to be uniformly distributed on as well. The chance (10) does not hold, could be estimated by the area of the intersection of the unit sphere and the ball of radius (see Figure 1-right), which can be shown to be small by standard estimates from high-dimensional probability.

By construction, the columns of the combined matrix are incoherent. This is the key observation, that allows us to prove Theorems 2 and 3 using standard techniques, see e.g. [20]. In particular, we automatically have exact recovery by the standard arguments [11] applied to if the data is noiseless.

Lemma 2 (Exact Recovery): Suppose is an -sparse solution of , and there is no noise, . In addition, assume that the columns of are incoherent: . Then, the solution to [4] satisfies for all

(11)

with probability .

Fast Noise Collector Algorithm

To find the minimizer [4], we consider a variational approach. We define the function

(12)

for a weight , and determine the solution as

(13)

The key observation is that this variational principle finds the minimum in [4] exactly for all values of the regularization parameter . Hence, the proposed method is fully automated, meaning that it has no tuning parameters. To determine the exact extremum in [13], we use the iterative soft thresholding algorithm GeLMA [21] that works as follows .

For we use in our numerical experiments. For optimal results, one can calibrate to be the smallest constant such that Theorem 1 holds, that is, we see no phantom signals when the algorithm is fed with pure noise.

Pick a value for the regularization parameter , e.g. . Choose step sizes and 555Choosing two step sizes instead of the smaller one improves the convergence speed.. Set , , , and iterate for :

(14)

where .

The Noise Collector matrix is computed by drawing normally distributed -dimensional vectors, normalized to unit length. These are the generating vectors of the Noise Collector. From each of them a circulant matrix , , is constructed. The Noise Collector matrix is obtained by concatenation, so . Exploiting the circulant structure of the matrices , we perform the matrix vector multiplications and in (14) using the FFT [15]. This makes the complexity associated to the Noise Collector . Note that only the generating vectors are stored, and not the entire Noise Collector matrix. In practice, we use which makes the cost of using the Noise Collector negligible, as typically .

Application to imaging

We consider passive array imaging of point sources. The problem consists in determining the positions and the complex666We chose to work with real numbers in the previous sections for ease of presentation but the results also hold with complex numbers. amplitudes , , of a few point sources from measurements of polychromatic signals on an array of receivers; see Figure 2. The imaging system is characterized by the array aperture , the distance to the sources, the bandwidth and the central wavelength .

Figure 2: General setup for passive array imaging. The source at emits a signal that is recorded at all array elements , .

The sources are located inside an image window IW, which is discretized with a uniform grid of points , . The unknown is the source vector , whose components correspond to the complex amplitudes of the sources at the grid points , , with . For the true source vector we have if for some , while otherwise.

Denoting by the Green’s function for the propagation of a signal of angular frequency from point to point , we define the single-frequency Green’s function vector that connects a point in the IW with all points , , on the array as

In a homogeneous medium in three dimensions, .

The data for the imaging problem are the signals

(15)

recorded at receiver locations , , at frequencies , . These data are stacked in a column vector

(16)

with . Then, , with the measurement matrix whose columns are the multiple-frequency Green’s function vectors

(17)

normalized to have length one. The system relates the unknown vector to the data vector .

Next, we illustrate the performance of the Noise Collector in this imaging setup. The most important features are that (i) no calibration is necessary with respect to the level of noise, (ii) exact support recovery for relatively large levels of noise (i.e., ), and (iii) zero false discovery rate for all levels of noise, with high probability.

We consider a high frequency microwave imaging regime with central frequency GHz corresponding to mm. We make measurements for equally spaced frequencies spanning a bandwidth GHz. The array has receivers and an aperture cm. The distance from the array to the center of the imaging window is cm. Then, the resolution is mm in the cross-range (direction parallel to the array) and mm in range (direction of propagation). These parameters are typical in microwave scanning technology [18].

Figure 3: Noiseless data. The exact solution is recovered for any value of in [14]; Left: the true image. Right: the recovered solution vector, , is plotted with red stars and the true solution vector, , with green circles.

We seek to image a source vector with sparsity ; see the left plot in Fig. 3. The size of the imaging window is 20cm60cm and the pixel spacing is 5mm15mm. The number of unknowns is, therefore, and the number of data is . The size of the noise collector is taken to be , so . When the data is noiseless, we obtain exact recovery as expected; see the right plot in Fig. 3

In Fig. 4, we display the imaging results, with and without a Noise Collector, when the data is corrupted by additive noise. The SNR , so the -norms of the signals and the noise are equal. In the left plot, we show the recovered image using -norm minimization without a Noise Collector. There is a lot of grass in this image, with many non-zero values outside the true support. When a Noise Collector is used, the level of the grass is reduced and the image improves; see the second from the left plot. Still, there are several false discoveries because we use in [14].

In the third column from the left of Fig. 4 we show the image obtained with a weight in [14]. With this weight, there are no false discoveries and the recovered support is exact. This simplifies the imaging problem dramatically, as we can now restrict the inverse problem to the true support just obtained, and then solve an overdetermined linear system using a classical approach. The results are shown in the right column of Fig. 4. Note that this second step largely compensates for the signal that was lost in the first step due to the high level of noise.

In Figure 5 we illustrate the performance of the Noise Collector for different sparsity levels and SNR values. Success in recovering the true support of the unknown corresponds to the value (yellow) and failure to

(blue). The small phase transition zone (green) contains intermediate values. These results are obtained by averaging over 5 realizations of noise.

no NC with NC and with NC and on the support
Figure 4: High level of noise; SNR . From left to right: -norm minimization without the noise collector; -norm minimization with a noise collector with columns, and in (14); -norm minimization with a noise collector, and the correct in (14); -norm solution restricted to the support. In the top row we show the images, and in the bottom row the solution vector with red stars and the true solution vector with green circles.

Remark 1: We considered passive array imaging for ease of presentation. Same results hold for active array imaging with or without multiple scattering; see [7] for the detailed analytical setup.

Remark 2: We have considered a microwave imaging regime. Similar results can be obtained in other regimes.

Figure 5: Algorithm performance for exact support recovery. Success corresponds to the value (yellow) and failure to (blue). The small phase transition zone (green) contains intermediate values. Ordinate and abscissa are the sparsity and SNR.

Proofs

Proof of Lemma 1: Denote the event

By independence, for any and . Thus, . Choosing for sufficiently large , we get

where and . Hence,  [6] holds with large probability .

Next, we consider the chances that [7] does not hold. Suppose there is a direction such that

(18)

holds for all . Let be the -dimensional volume of a parallelogram spanned by , , . Note that is equal to times its height. Then, if (18) holds,

(19)

for any choice of columns from the noise collector . If we fix the indices , , then, due to rotational invariance, the probability of the event (19) equals the probability of event . Using

and that we can find sets of distinct indices , , , we conclude that

Choosing sufficiently small, i.e. , and sufficiently large, we obtain the result.

Proof of Theorem 1: In order to check (10), we assume that both and are in , because it is more geometrically intuitive to work with the convex hull

(20)

It implies we may also assume in (4) has non-negative coefficients, and . Thus, is a norm of with respect to , and we can set . This norm is called atomic in [8]. Suppose is the support of . Its typical size . Then, the simplex

has the unique normal vector , which is collinear to our dual certificate , because

(21)

The estimate (7) implies that the convex hull contains an -ball of radius . Therefore, with large probability.

By construction, the distribution of is rotationally invariant with respect to the probability measure induced by all and . Thus is also uniformly distributed on , and

(22)

for all , see e.g. [30]. Therefore, we can bound the probability that (10) does not hold:

for large and appropriately chosen . Hence, (10) holds with large probability .

Proof of Theorem 2: If the columns of are orthogonal, our previous arguments could be modified to verify Theorem 2. Indeed, suppose is the span of the column vectors , with in the support of . Say, is spanned by , , . Let be the orthogonal complement to . Then, the orthogonal projection of the signal . By the concentration of measure see e.g. [30], the projection of the noise is uniformly distributed on the unit sphere with large probability. Applying the previous arguments to , the projection of on , we conclude that the projection . Therefore, with large probability.

For general consider the orthogonal decomposition for all . As before, we can choose so that with large probability. It remains to demonstrate that . Fix any . Suppose , and . Thus,

Then, , so . Hence,

Proof of Theorem 3: It suffices to prove the result for 1-sparse , say, . We will demonstrate that the solution to the minimization problem

(23)

with , satisfies if , . This implies .

Suppose , and are the optimal solutions of

(24)

with right-hand sides , , and , respectively. Since , we have

and, therefore,

(25)

From (7) and (8), we have

respectively. Suppose , . Then, for any , and for large enough,

Using (25) with , we conclude that

for all . It implies (23).

Acknowledgements The work of M. Moscoso was partially supported by Spanish grant MICINN FIS2016-77892-R. The work of A.Novikov was partially supported by NSF grants DMS-1515187, DMS-1813943. The work of G. Papanicolaou was partially supported by AFOSR FA9550-18-1-0519. The work of C. Tsogka was partially supported by AFOSR FA9550-17-1-0238 and FA9550-18-1-0519. We thank Marguerite Novikov for drawing Figure 1.

References

  • [1] M. AlQuraishi and H. H. McAdams, Direct inference of protein DNA interactions using compressed sensing methods, Proc. Natl. Acad. Sci. U.S.A 108,14819–14824 (2001).
  • [2] R. Baraniuk and Philippe Steeghs, Compressive Radar Imaging, in 2007 IEEE Radar Conference, Apr. 2007, 128–133.
  • [3] L. Borcea and I. Kocyigit, Resolution analysis of imaging with optimization, SIAM J. Imaging Sci. 8, 3015–3050 (2015).
  • [4] E. J Candès and T. Tao, Decoding by linear programming, IEEE Trans. Inf. Theory 51, 4203–4215 (2005).
  • [5] E. J. Candès and C. Fernandez-Granda, Towards a mathematical theory of super-resolution, Comm. Pure Appl. Math. 67, 906-956 (2014).
  • [6]

    A. Chai, M. Moscoso and G. Papanicolaou, Robust imaging of localized scatterers using the singular value decomposition and

    optimization, Inverse Problems 29, 025016 (2013).
  • [7] A. Chai, M. Moscoso and G. Papanicolaou, Imaging Strong Localized Scatterers with Sparsity Promoting Optimization, SIAM J. Imaging Sci. 7, 1358–1387 (2014).
  • [8] V. Chandrasekaran, B.Recht, P. A. Parrilo, A. S. Willsky, The convex geometry of linear inverse problems, Found. Comput. Math. 12, 805–849 (2012).
  • [9] S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic decomposition by basis pursuit, SIAM Rev. 43, 12–159 (2001).
  • [10] D. L. Donoho, Super-resolution via sparsity constraint, SIAM J Math Anal 23, 1303–1331 (1992).
  • [11] D. L. Donoho and M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via minimization, Proc. Natl. Acad. Sci. U.S.A 100, 2197–2202 (2003).
  • [12] A. C. Fannjiang, T. Strohmer, and P. Yan, Compressed remote sensing of sparse objects, SIAM J. Imag. Sci. 3, 595-618 (2010).
  • [13] A. C. Fannjiang and W. Liao, Coherence pattern-guided compressive sensing with unresolved grids, SIAM J. Imag. Sci. 5, 179–202 (2012).
  • [14] J. J. Fuchs, Recovery of exact sparse representations in the presence of bounded noise, IEEE Trans. Inf. Theory 51, 3601–3608 (2005).
  • [15] R. M. Gray, Toeplitz and Circulant Matrices: A Review, Foundations and Trends in Communications and Information Theory 2, 155–239 (2006).
  • [16] M. A. Herman and T. Strohmer, High-Resolution Radar via Compressed Sensing, IEEE Trans. Signal Process. 57, 2275-2284 (2009).
  • [17] J. N. Laska, M. A. Davenport and R. G. Baraniuk, Exact signal recovery from sparsely corrupted measurements through the Pursuit of Justice, 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, 2009,  1556–1560.
  • [18] J. Laviada, A. Arboleya-Arboleya, Y. Alvarez-Lopez, C. Garcia-Gonzalez and F. Las-Heras, Phaseless synthetic aperture radar with efficient sampling for broadband near-field imaging: Theory and validation, IEEE Trans. Antennas Propag., 63:2 (2015), pp. 573–584.
  • [19] D. Malioutov, M. Cetin, A.S. Willsky, A sparse signal reconstruction perspective for source localization with sensor arrays, IEEE Trans. Signal Process. 53, 3010–3022 (2005).
  • [20] M. Moscoso, A. Novikov, G. Papanicolaou and C.Tsogka, Imaging with highly incomplete and corrupted data, submitted.
  • [21] M. Moscoso, A. Novikov, G. Papanicolaou and L. Ryzhik, A differential equations approach to l1-minimization with applications to array imaging, Inverse Problems 28 (2012).
  • [22] J. Romberg, Imaging via Compressive Sampling, IEEE Signal Process. Mag. 25, 14–20 (2008).
  • [23] J. N. Sampson, N. Chatterjee, R. J. Carroll, and S. Müller, Controlling the local false discovery rate in the adaptive Lasso, Biostatistics 14, 653–666 (2013).
  • [24] H. L. Taylor, S. C. Banks, and J. F. McCoy, Deconvolution with the l 1 norm, Geophysics 44, 39–52 (1979).
  • [25] T. Terrence, A cheap version of the Kabatjanskii-Levenstein bound for almost orthogonal vectors, https://terrytao.wordpress.com/2013/07/18/a-cheap-version-of-the-kabatjanskii-levenstein-bound-for-almost-orthogonal-vectors/
  • [26] R. Tibshirani, Regression Shrinkage and Selection via the lasso, Journal of the Royal Statistical Society. Series B (methodological) 58, 267–288 (1996).
  • [27] J. A. Tropp, Just Relax: Convex Programming Methods for Identifying Sparse Signals in Noise, IEEE Trans. Inf. Theory 52, 1030–1051 (2006).
  • [28] J. A. Tropp, J. N. Laska, M. F. Duarte, J. K. Romberg, and R. G. Baraniuk, Beyond Nyquist: Efficient Sampling of Sparse Bandlimited Signals, IEEE Transactions on Information Theory 56, 520–544 (2010).
  • [29] J. Trzasko and A. Manduca, Highly undersampled magnetic resonance image reconstruction via homotopic -minimization, IEEE Trans. Med. Imag. 28, 106–121 (2009).
  • [30]

    R. Vershynin, High-dimensional probability. An introduction with applications in data science, Cambridge University Press, 2018.

  • [31] M. J. Wainwright, Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using -Constrained Quadratic Programming (Lasso), IEEE Trans. Inf. Theory 55, 2183–2202 (2009).
  • [32] H. Zou, The Adaptive Lasso and Its Oracle Properties, J. Amer. Statist. Assoc. 101, 1418–1429 (2006).