# Robust Bayesian Compressed sensing

We consider the problem of robust compressed sensing whose objective is to recover a high-dimensional sparse signal from compressed measurements corrupted by outliers. A new sparse Bayesian learning method is developed for robust compressed sensing. The basic idea of the proposed method is to identify and remove the outliers from sparse signal recovery. To automatically identify the outliers, we employ a set of binary indicator hyperparameters to indicate which observations are outliers. These indicator hyperparameters are treated as random variables and assigned a beta process prior such that their values are confined to be binary. In addition, a Gaussian-inverse Gamma prior is imposed on the sparse signal to promote sparsity. Based on this hierarchical prior model, we develop a variational Bayesian method to estimate the indicator hyperparameters as well as the sparse signal. Simulation results show that the proposed method achieves a substantial performance improvement over existing robust compressed sensing techniques.

Comments

There are no comments yet.

## Authors

• 6 publications
• 6 publications
• 17 publications
• 11 publications
10/31/2013

### Robust Compressed Sensing and Sparse Coding with the Difference Map

In compressed sensing, we wish to reconstruct a sparse signal x from obs...
08/24/2017

### Bayesian Compressive Sensing Using Normal Product Priors

In this paper, we introduce a new sparsity-promoting prior, namely, the ...
06/11/2015

### Sparse Proteomics Analysis - A compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data

Background: High-throughput proteomics techniques, such as mass spectrom...
05/16/2019

### How Entropic Regression Beats the Outliers Problem in Nonlinear System Identification

System identification (SID) is central in science and engineering applic...
10/16/2019

### Variance State Propagation for Structured Sparse Bayesian Learning

We propose a compressed sensing algorithm termed variance state propagat...
10/29/2016

### Sparse Signal Recovery for Binary Compressed Sensing by Majority Voting Neural Networks

In this paper, we propose majority voting neural networks for sparse sig...
11/25/2020

### Learning sparse structures for physics-inspired compressed sensing

In underwater acoustics, shallow water environments act as modal dispers...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Compressed sensing, a new paradigm for data acquisition and reconstruction, has drawn much attention over the past few years [1, 2, 3]

. The main purpose of compressed sensing is to recover a high-dimensional sparse signal from a low-dimensional linear measurement vector. In practice, measurements are inevitably contaminated by noise due to hardware imperfections, quantization errors, or transmission errors. Most existing studies (e.g.

[4, 5, 6]) assume that measurements are corrupted with noise that is evenly distributed across the observations, such as independent and identically distributed (i.i.d.) Gaussian, thermal, or quantization noise. This assumption is valid for many cases. Nevertheless, for some scenarios, measurements may be corrupted by outliers that are significantly different from their nominal values. For example, during the data acquisition process, outliers can be caused by sensor failures or calibration errors [7, 8], and it is usually unknown which measurements have been corrupted. Outliers can also arise as a result of signal clipping/saturation or impulse noise [9, 10]. Conventional compressed sensing techniques may incur severe performance degradation in the presence of outliers. To address this issue, in previous works (e.g. [7, 8, 9, 10]), outliers are modeled as a sparse error vector, and the observed data are expressed as

 y=Ax+e+w (1)

where is the sampling matrix with , denotes an -dimensional sparse vector with only nonzero coefficients, denotes the outlier vector consisting of nonzero entries with arbitrary amplitudes, and denotes the additive multivariate Gaussian noise with zero mean and covariance matrix . The above model can be formulated as a conventional compressed sensing problem as

 y=[AI][xe]+w≜Bu+w (2)

Efficient compressed sensing algorithms can then be employed to estimate the sparse signal as well as the outliers. Recovery guarantees of and were also analyzed in [7, 8, 9, 10].

The rationale behind the above approach is to detect and compensate for these outliers simultaneously. Besides the above method, another more direct approach is to identify and exclude the outliers from sparse signal recovery. Although it may seem preferable to compensate rather than simply reject outliers, inaccurate estimation of the compensation (i.e. outlier vector) could result in a destructive effect on sparse signal recovery, particularly when the number of measurements is limited. In this case, identifying and rejecting outliers could be a more sensible strategy. Motivated by this insight, we develop a Bayesian framework for robust compressed sensing, in which a set of binary indicator variables are employed to indicate which observations are outliers. These variables are assigned a beta-Bernoulli hierarchical prior such that their values are confined to be binary. Also, a Gaussian inverse-Gamma prior is placed on the sparse signal to promote sparsity. A variational Bayesian method is developed to find the approximate posterior distributions of the indicators, the sparse signal and other latent variables. Simulation results show that the proposed method achieves a substantial performance improvement over the compensation-based robust compressed sensing method.

## Ii Hierarchical Prior Model

We develop a Bayesian framework which employs a set of indicator variables to indicate which observation is an outlier, i.e. indicates that is a normal observation, otherwise is an outlier. More precisely, we can write

 ym={armx+wmzm=1armx+wm+emzm=0 (3)

where denotes the th row of , and are the th entry of and

, respectively. The probability of the observed data conditional on these indicator variables can be expressed as

 p(y|x,z,γ)=M∏m=1(N(ym|armx,1/γ))zm (4)

in which those “presumed outliers” are automatically disabled when calculating the probability. To infer the indicator variables, a beta-Bernoulli hierarchical prior [11, 12] is placed on , i.e. each component of

is assumed to be drawn from a Bernoulli distribution parameterized by

 p(zm|πm)=Bernoulli(zm|πm)=πzmm(1−πm)1−zm∀m (5)

and

follows a beta distribution

 p(πm)=Beta(e,f)∀m (6)

where and are parameters characterizing the beta distribution. Note that the beta-Bernoulli prior assumes the random variables are mutually independent, and so are the random variables .

To encourage a sparse solution, a Gaussian-inverse Gamma hierarchical prior, which has been widely used in sparse Bayesian learning (e.g. [13, 14, 15, 16]), is employed. Specifically, in the first layer, is assigned a Gaussian prior distribution

 p(x|α)=N∏n=1p(xn|αn) (7)

where , and are non-negative hyperparameters controlling the sparsity of the signal

. The second layer specifies Gamma distributions as hyperpriors over the precision parameters

, i.e.

 p(α)=N∏n=1Gamma(αn|a,b)=N∏n=1Γ(a)−1baαa−1ne−bαn (8)

where the parameters and are set to small values (e.g. ) in order to provide non-informative (over a logarithmic scale) hyperpriors over

. Also, to estimate the noise variance, we place a Gamma hyperprior over

, i.e.

 p(γ)=Gamma(γ|c,d)=Γ(c)−1dcγc−1e−dγ (9)

where the parameters and are set to be small, e.g. . The graphical model of the proposed hierarchical prior is shown in Fig. 1.

## Iii Variational Bayesian Inference

We now proceed to perform Bayesian inference for the proposed hierarchical model. Let denote the hidden variables in our hierarchical model. Our objective is to find the posterior distribution , which is usually computationally intractable. To circumvent this difficulty, observe that the marginal probability of the observed data can be decomposed into two terms

 lnp(y)=L(q)+KL(q||p) (10)

where

 L(q)=∫q(θ)lnp(y,θ)q(θ)dθ (11)

and

 KL(q||p)=−∫q(θ)lnp(θ|y)q(θ)dθ (12)

where

is any probability density function,

is the Kullback-Leibler divergence between

and . Since , it follows that is a rigorous lower bound on . Moreover, notice that the left hand side of (10) is independent of . Therefore maximizing is equivalent to minimizing , and thus the posterior distribution can be approximated by through maximizing . Specifically, we could assume some specific parameterized functional form for and then maximize with respect to the parameters of the distribution. A particular form of that has been widely used with great success is the factorized form over the component variables in [17]. For our case, the factorized form of can be written as

 q(θ)=qz(z)qx(x)qα(α)qπ(π)qγ(γ) (13)

We can compute the posterior distribution approximation by finding of the factorized form that maximizes the lower bound . The maximization can be conducted in an alternating fashion for each latent variable, which leads to [17]

 lnqx(x)= ⟨lnp(y,θ)⟩qα(α)qγ(γ)qz(z)qπ(π)+constant lnqα(α)= ⟨lnp(y,θ)⟩qx(x)qγ(γ)qz(z)qπ(π)+constant lnqγ(γ)= ⟨lnp(y,θ)⟩qx(x)qα(α)qz(z)qπ(π)+constant lnqz(z)= ⟨lnp(y,θ)⟩qx(x)qα(α)qγ(γ)qπ(π)+constant lnqπ(π)= ⟨lnp(y,θ)⟩qx(x)qα(α)qγ(γ)qz(z)+constant

where denotes an expectation with respect to the distributions specified in the subscript. More details of the Bayesian inference are provided below.

1) Update of : We first consider the calculation of . Keeping those terms that are dependent on , we have

 lnqx(x)∝ ⟨lnp(y|x,z,γ)+lnp(x|α)⟩qα(α)qγ(γ)qz(z) ∝ −M∑m=1⟨γzm(ym−armx)2⟩2−12N∑n⟨αnx2n⟩ = −⟨γ⟩(y−Ax)TDz(y−Ax)2−12xTDαx (15)

where

 Dz≜diag(⟨z⟩),  Dα≜diag(⟨α⟩) (16)

and denote the expectation of and , respectively. It is easy to show that

follows a Gaussian distribution with its mean and covariance matrix given respectively by

 μx= ⟨γ⟩ΦxATDzy (17) Φx= (⟨γ⟩ATDzA+Dα)−1 (18)

2) Update of : Keeping only the terms that depend on , the variational optimization of yields

 lnqα(α)∝ ⟨lnp(x|α)+lnp(α|a,b)⟩qx(x) = N∑n=1(a+0.5)lnαn−(0.5⟨x2n⟩+b)αn (19)

The posterior therefore follows a Gamma distribution

 qα(α)=N∏n=1Gamma(αn|~a,~bn) (20)

in which and are given respectively as

 ~a= a+0.5 ~bn= b+0.5⟨x2n⟩

3). Update of : The variational approximation of can be obtained as:

 lnqγ(γ)∝ ⟨lnp(y|x,z,γ)+lnp(γ|c,d)⟩qx(x)qz(z) ∝ M∑m=1(0.5⟨zm⟩lnγ−0.5γ⟨zm⟩⟨(ym−armx)2⟩) +(c−1)lnγ−dγ = (c+0.5M∑m=1⟨zm⟩−1)lnγ−(d+0.5⟨(y−Ax)T Dz(y−Ax)⟩)γ (21)

Clearly, the posterior obeys a Gamma distribution

 qγ(γ)=Gamma(γ|~c,~d) (22)

where and are given respectively as

 ~c= c+0.5M∑m=1⟨zm⟩ (23) ~d= d+0.5⟨(y−Ax)TDz(y−Ax)⟩qx(x) (24)

in which

 ⟨(y−Ax)TDz(y−Ax)⟩qx(x) = (y−Aμx)TDz(y−Aμx)+% trace(ATDzAΦx)

4) Update of : The posterior approximation of yields

 lnqz(z)∝ ⟨lnp(y|x,z,γ)+lnp(z|π)⟩qx(x)qγ(γ)qπ(π) ∝ M∑m=1⟨zm(−0.5γ(ym−armx)2+lnπm)+ (1−zm)ln(1−πm)⟩ (25)

Clearly, still follows a Bernoulli distribution with its probability given by

 P(zm=1) =Ce⟨lnπm⟩e−γ⟨(ym−armx)2⟩2 (26) P(zm=0) =Ce⟨ln(1−πm)⟩ (27)

where is a normalizing constant such that , and

 ⟨(ym−armx)2⟩= (ym−armμx)2+armΦxarmT ⟨lnπm⟩= Ψ(e+⟨zm⟩)−Ψ(e+f+1) ⟨ln(1−πm)⟩= Ψ(1+f−⟨zm⟩)−Ψ(e+f+1) (28)

The last two equalities can also be found in [12], in which represents the digamma function.

5) Update of : The posterior approximation of can be calculated as

 lnqπ(π)∝ ⟨lnp(z|π)+lnp(π|e,f)⟩qz(z) ∝ M∑m=1⟨zmlnπm+(1−zm)ln(1−πm)+(e−1)lnπm +(f−1)ln(1−πm)⟩ = M∑m=1⟨(zm+e−1)lnπm+(f−zm)ln(1−πm)⟩ (29)

It can be easily verified that follows a Beta distribution, i.e.

 qπ(π)=∏mp(πm)=∏m% Beta(⟨zm⟩+e,1+f−⟨zm⟩) (30)

In summary, the variational Bayesian inference involves updates of the approximate posterior distributions for hidden variables , , , , and

in an alternating fashion. Some of the expectations and moments used during the update are summarized as

 ⟨αn⟩= ~a~bn ⟨γ⟩= ~c~d ⟨x2n⟩= ⟨xn⟩2+Φx(n,n) ⟨zm⟩= P(zm=1)P(zm=1)+P(zm=0)

where denotes the th diagonal element of .

## Iv Simulation Results

We now carry out experiments to illustrate the performance of our proposed method which is referred to as the beta-Bernoulli prior model-based robust Bayesian compressed sensing method (BP-RBCS)111Codes are available at http://www.junfang-uestc.net/codes/RBCS.rar. As discussed earlier, another robust compressed sensing approach is compensation-based and can be formulated as a conventional compressed sensing problem (2). For comparison, the sparse Bayesian learning method [18, 13] is employed to solve (2), and this method is referred to as the compensation-based robust Bayesian compressed sensing method (C-RBCS). Also, we consider an “ideal” method which assumes the knowledge of the locations of the outliers. The outliers are then removed and the sparse Bayeisan learning method is employed to recover the sparse signal. This ideal method is referred to as RBCS-ideal, and serves as a benchmark for the performance of the BP-RBCS and C-RBCS. Note that both C-RBCS and RBCS-ideal use the sparse Bayesian learning method for sparse signal recovery. The parameters of the sparse Bayesian learning method are set to . Our proposed method involves the parameters . The first four are also set to . The beta-Bernoulli parameters are set to and since we expect that the number of outliers is usually small relative to the total number of measurements. Our simulation results suggest that stable recovery is ensured as long as is set to a value in the range .

We consider the problem of direction-of-arrival (DOA) estimation where narrowband far-field sources impinge on a uniform linear array of sensors from different directions. The received signal can be expressed as

 y=Ax+w

where denotes i.i.d. Gaussian observation noise with zero mean and variance , is an overcomplete dictionary constructed by evenly-spaced angular points , with the th entry of given by , in which denotes the distance between two adjacent sensors, represents the wavelength of the source signal, and are evenly-spaced grid points in the interval . The signal contains nonzero entries that are independently drawn from a unit circle. Suppose that out of measurements are corrupted by outliers. For those corrupted measurements , their values are chosen uniformly from .

We first consider a noiseless case, i.e. . Fig. 2 depicts the success rates of different methods vs. the number of measurements and the number of outliers, respectively, where we set , , (the number of outliers) in Fig. 2(a), and , , in Fig. 2(b). The success rate is computed as the ratio of the number of successful trials to the total number of independent runs. A trial is considered successful if the normalized reconstruction error of the sparse signal is no greater than . From Fig. 2, we see that our proposed BP-RBCS achieves a substantial performance improvement over the C-RBCS. This result corroborates our claim that rejecting outliers is a better strategy than compensating for outliers, particularly when the number of measurements is small, because inaccurate estimation of the compensation vector could lead to a destructive, instead of a constructive, effect on sparse signal recovery. Next, we consider a noisy case with . Fig. 3 plots the normalized mean square errors (NMSEs) of the recovered sparse signal by different methods vs. the number of measurements and the number of outliers, respectively, we set , , in Fig. 3(a), and , , in Fig. 3(b). This result, again, demonstrates the superiority of our proposed method over the C-RBCS.

## V Conclusions

We proposed a new Bayesian method for robust compressed sensing. The rationale behind the proposed method is to identify the outliers and exclude them from sparse signal recovery. To this objective, a set of indicator variables were employed to indicate which observations are outliers. A beta-Bernoulli prior is assigned to these indicator variables. A variational Bayesian inference method was developed to find the approximate posterior distributions of the latent variables. Simulation results show that our proposed method achieves a substantial performance improvement over the compensation-based robust compressed sensing method.

## References

• [1] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM J. Sci. Comput., vol. 20, no. 1, pp. 33–61, 1998.
• [2]

E. Candés and T. Tao, “Decoding by linear programming,”

IEEE Trans. Information Theory, no. 12, pp. 4203–4215, Dec. 2005.
• [3] D. L. Donoho, “Compressive sensing,” IEEE Trans. Inform. Theory, vol. 52, pp. 1289–1306, 2006.
• [4] E. Candes, “The restricted isometry property and its implications for compressive sensing,” Compte Rendus de l’Academie des Sciences, Paris, Serie I, vol. 346, pp. 589–592, 2008.
• [5] M. J. Wainwright, “Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting,” IEEE Trans. Information Theory, vol. 55, no. 12, pp. 5728–5741, Dec. 2009.
• [6] T. Wimalajeewa and P. K. Varshney, “Performance bounds for sparsity pattern recovery with quantized noisy random projections,” IEEE Journal on Selected Topics in Signal Processing, vol. 6, no. 1, pp. 43–57, Feb. 2012.
• [7] R. G. B. Jason N. Laska, Mark A. Davenport, “Exact signal recovery from sparsely corrupted measurements through the pursuit of justice,” in The 43rd Asilomar Conference on Signals, Systems and Computers, Pacific Grove, California, USA, November 1-4 2009.
• [8] R. C. Kaushik Mitra, Ashok Veeraraghavan, “Analysis of sparse regularization based robust regression approaches,” IEEE Trans. Signal Processing, no. 5, pp. 1249–1257, Mar. 2013.
• [9] R. E. Carrillo, K. E. Barner, and T. C. Aysal, “Robust sampling and reconstruction methods for sparse signals in the presence of impulsive noise,” IEEE Journal of Selected Topics in Signal Processing, no. 2, pp. 392–408, Apr. 2010.
• [10] C. Studer, P. Kuppinger, G. Pope, and H. Bolcskei, “Recovery of sparsely corrupted signals,” IEEE Trans. Information Theory, no. 5, pp. 3115–3130, May 2012.
• [11] L. He and L. Carin, “Exploiting structure in wavelet-based Bayesian compressive sensing,” IEEE Trans. Signal Processing, vol. 57, no. 9, pp. 3488–3497, Sept. 2009.
• [12] J. Paisley and L. Carin, “Nonparametric factor analysis with Beta process priors,” in

26th Annual International Conference on Machine Learning

, Montreal, Canada, June 14-18 2009.
• [13] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE Trans. Signal Processing, vol. 56, no. 6, pp. 2346–2356, June 2008.
• [14] Z. Zhang and B. D. Rao, “Extension of SBL algorithms for the recovery of block sparse signals with intra-block correlation,” IEEE Trans. Signal Processing, vol. 61, no. 8, pp. 2009–2015, Apr. 2013.
• [15] Z. Yang, L. Xie, and C. Zhang, “Off-grid direction of arrival estimation using sparse Bayesian inference,” IEEE Trans. Signal Processing, vol. 61, no. 1, pp. 38–42, Jan. 2013.
• [16] J. Fang, Y. Shen, H. Li, and P. Wang, “Pattern-coupled sparse Bayesian learning for recovery of block-sparse signals,” IEEE Trans. Signal Processing, no. 2, pp. 360–372, Jan. 2015.
• [17] D. G. Tzikas, A. C. Likas, and N. P. Galatsanos, “The variational approximation for Bayesian inference,” IEEE Signal Processing Magazine, pp. 131–146, Nov. 2008.
• [18] M. Tipping, “Sparse Bayesian learning and the relevance vector machine,” Journal of Machine Learning Research, vol. 1, pp. 211–244, 2001.