Bayesian Compressive Sensing Using Normal Product Priors

08/24/2017
by   Zhou Zhou, et al.
0

In this paper, we introduce a new sparsity-promoting prior, namely, the "normal product" prior, and develop an efficient algorithm for sparse signal recovery under the Bayesian framework. The normal product distribution is the distribution of a product of two normally distributed variables with zero means and possibly different variances. Like other sparsity-encouraging distributions such as the Student's t-distribution, the normal product distribution has a sharp peak at origin, which makes it a suitable prior to encourage sparse solutions. A two-stage normal product-based hierarchical model is proposed. We resort to the variational Bayesian (VB) method to perform the inference. Simulations are conducted to illustrate the effectiveness of our proposed algorithm as compared with other state-of-the-art compressed sensing algorithms.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/10/2016

Robust Bayesian Compressed sensing

We consider the problem of robust compressed sensing whose objective is ...
11/10/2014

Sparse Estimation with Generalized Beta Mixture and the Horseshoe Prior

In this paper, the use of the Generalized Beta Mixture (GBM) and Horsesh...
10/16/2019

Variance State Propagation for Structured Sparse Bayesian Learning

We propose a compressed sensing algorithm termed variance state propagat...
06/21/2021

Instance-Optimal Compressed Sensing via Posterior Sampling

We characterize the measurement complexity of compressed sensing of sign...
04/02/2019

BCMA-ES: A Bayesian approach to CMA-ES

This paper introduces a novel theoretically sound approach for the celeb...
04/02/2018

A Compressed Sensing Approach for Distribution Matching

In this work, we formulate the fixed-length distribution matching as a B...
11/25/2020

Learning sparse structures for physics-inspired compressed sensing

In underwater acoustics, shallow water environments act as modal dispers...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Compressed sensing [1, 2]

is a new data acquisition technique that has attracted much attention over the past decade. Existing methods for compressed sensing can generally be classified into the following categories, i.e. the greedy pursuit approach

[3], the convex relaxation-type approach [4] and the nonconvex optimization method [5]. Another class of compressed sensing techniques that have received increasing attention are Bayesian methods. In the Bayesian framework, the signal is usually assigned a sparsity-encouraging prior, such as the Laplace prior and the Gaussian-inverse Gamma prior [6], to encourage sparse solutions. It has been shown in a series of experiments that Bayesian compressed sensing techniques [7] demonstrate superiority for sparse signal recovery as compared with the greedy methods and the basis pursuit method. One of the most popular prior model for Bayesian compressed sensing is the Gaussian-inverse Gamma prior proposed in [6]. The Gaussian-inverse Gamma prior is a two-layer hierarchical model in which the first layer specifies a Gaussian prior to the sparse signal and an inverse Gamma priori is assigned to the parameters characterizing the Gaussian prior. As discussed in [8], this two-stage hierarchical model is equivalent to imposing a Student’s -distribution on the sparse signal. Besides the Gaussian-inverse Gamma prior, authors in [9] employ Laplace priors for the sparse signal to promote sparse solutions.

In this paper, we introduce a new sparsity-encouraging prior, namely, the normal product (NP) prior, for sparse signal recovery. A two-stage normal product-based hierarchical model is established, and we resort to the variational Bayesian (VB) method to perform the inference for the hierarchical model. Our experiments show that the proposed algorithm achieves similar performance as the sparse Bayesian learning method, while with less computational complexity.

Ii The Bayesian Network

Fig. 1:

Three kinds of PDF with the same standard deviation

In the context of compressed sensing, we are given a noise corrupted linear measurements of a vector

(1)

Here is an underdetermined measurement matrix with low coherence of columns, represents the acquisition noise vector and a sparse signal. For the inverse problem, both and are unknown, and our goal is to recover from . We formulate the observation noise as zero mean independent Gaussian variable with variance i.e. and seek a sparse signal for .

In this section, we utilize a two-stage hierarchical bayesian prior for the signal model. In the first layer of signal model, we use the Normal Product (NP) distribution as sparseness prior:

where stands for its mean and given as a diagonal matrix represents its variance. For each element of

, its probability density distribution is

, which exhibits a sharp peak at the origin and heavy tails, where is the zero order modified Bessel function of the second kind[10]

thus the probability density function (PDF) of

is

(2)

A NP distributed scaler variate can be decomposed as the product of two independent normally distributed variables i.e. if , then there are and satisfying

with moment relationship

[11]. We call this property as the generating rule of Normal Product distribution. Similarly for the vector , we can decompose it into the Hadamard product of two independent virtual normally distributed vector variables and whose variance are diagonal matrix and respectively i.e. and , where denote the Hadamard product. Finally, we set

as a realization of Gamma hyperprior

and choose , to construct a sharper distribution of during the variational procedure [12] for the second layer of signal.

According to the generating rule of Normal Product mentioned above, we can add two parallel nodes and to construct one latent layer. Then the posterior distribution can be expressed as:

where

and denotes the Dirac Delta function. In this expression we know that the value of must be equal to while keeping the value of nonzero. Consider the Bayesian Risk function:

where

and

represents the loss function. Thus, we can replace

with

in the inference procedure while maintaining the value of the Bayesian risk function at a consistent level. So the modified Bayesian Network is:

(3)
(4)
(5)
(6)

as depicted in Fig. 2.

Fig. 2: The Bayesian Network

Iii The Varational Bayesian inference

In order to infer the bayesian network, we use the mean-field variational Bayesian method to analytically approximate the entire posterior distribution of latent variable. In this theorem, it was assumed that the approximate variational distribution where stands for the latent variable in the model could be factorized into the product for the partition of . It could be shown that the solution of the variational method for each factor could be written as:

where means taking the expectation over the set of using the variational distribution except [13] . Applying the variational Bayesian method upon the bayesian model mentioned in secetion II, the posterior approximation of are respectively:

(7)
(8)
(9)

Iii-a posterior approximation of and

Substituting (3), (4), (5) and (6) into (8), we can arrive at:

where

and

. From the principle of variational inference, we know that is an approximation of i.e. and in order to get a more concise iteration formula, we choose the relaxation of the secondary moment as:

This relaxation can be interpreted as ignoring the posteriori variance during the updating, which is inspired by the fact that after the learning procedure the posteriori variance always approaches zero to ensure the posteriori mean’s concentration on the estimated value.

Then the corrected posterior approximation is where

and

In the noise free case, using Woodbury identity we have:

(10)

and

(11)

Similarly, for , substituting (3), (4), (5) and (6) into (7), we have:

where

and

So the corrected approximation is:

Then again, in the noiseless case, we have

(12)

and

(13)

Iii-B posterior approximation of

As discussed in section II, we have:

Substituting (4), (5), (6) into (9) and using the separability of , it can be shown that:

which means

and

(14)

Similarly, we have:

and

(15)

Iii-C The Proposed Algorithm

As a result, we summarize the procedure above as two algorithms named “NP-0” and “NP-1” which represent the inference results for the one layer signal model and two layer signal model respectively. The difference between Np-0 and NP-1 is whether the learning process updates the precision of Normal Product.

Algorithm 1: NP-0 (One Layer Normal Product) 0:  , 0:   1:  , ,,, , , 2:  while  and  do 3:      Update using (12) 4:      Update using (10) 5:       6:       7:  end while

The algorithm NP-0 is similar to the FOCUSS algorithm[14]. The difference between them is that NP-0 uses and , the decomposed component of in an interleaved way to update while FOCUSS uses the whole to directly update . Furthermore, we set the initial value of the algorithm as a constant to avoid the local minimum results being returned.

Algorithm 2: NP-1 (Two Layers Normal Product) 0:  , 0:   1:  , ,, , , 2:  while  and  do 3:      Update using (12) 4:      Update using (13) 5:      Update using (10) 6:      Update using (11) 7:       8:      Update using (14) 9:      Update using (15) 10:       11:  end while

The algorithm NP-1 looks like coupling two sparse Bayesian learning (SBL) [6, 8] procedures together and we will experientially prove that the MSE via NP-1 descends faster than SBL in section IV.

Iv Numerical Results

In this section, we compare our proposed algorithms among SBL, iterative reweighted least squares (IRLS)[15] and basis pursuit (BP)[4]. In the following results, we set the dimension of original signal to be 100 and in each experiment every entries of the sensing matrix

is uniformly distributed as standard normal random variable. The stop criterion parameters

and in both algorithm NP-0 and NP-1 are set as and respectively.

The success rate in the Fig. 3 is defined by the average probability of successful restoration, which is counted as the ratio between the number of successful trials and the total number of the independent experiments. It collects the result of 300 Monte Carlo trials and we assume successful trial if , where is the estimate of the original signal . The mean squared error (MSE) in Fig. 4 which measures the average of the squares of the difference between the estimator and estimated signal is calculated as follow:

Fig. 3 demonstrats the superior performance of the NP-1 algorithm in a few measurements. It shows the success rate of respective algorithms versus the number of measurement M and sparse level K of the signal. We can see that the reconstruction performance of the one layer normal product algorithm(NP-0) is almost the same as BP, which is the MAP estimation of unknowns in the one layer Laplace prior framework. The comparisons also show that the two layers normal product algorithm(NP-1) requires as few number of CS measurements as SBL while inheriting the similar reconstruction precision.

In Fig. 4, the comparison among the computational cost of the three algorithms on the reconstruction performance are performed on a personal PC with dual-core 3.1 GHz CPU and 4GB RAM. It is interesting to note that the MSE of NP-1 descends faster than SBL in Fig. 4(a) while it has the same sparsity-undersampling tradeoff performance as SBL. It should also be noticed that we haven’t use any basis pruning technology as in [16] to reduce the computational complexity. To understand the reason of low computational time from Fig. 4(b), we observe that the NP-1 requires only a few numbers of iterations to arrive at stop condition. In summary, these experimental results confirm that the NP-1 is a fast Sparse Bayesian algorithm.

(a) K=3
(b) M=30
Fig. 3: Simulation results of Success Rate
(a) K=3,M=30
(b) K=3,M=30
Fig. 4: Simulation results of Computational Cost

V Conculsion

In this letter, we formulated a Normal Product prior based Bayesian framework to solve the compressed sensing problem in noise free case. Using this framework, we derived two algorithms named NP-0 and NP-1 and compared them with different algorithms. We have shown that our algorithm NP-1 has the similar reconstruction performance with SBL while the interleaved updating procedure provides improved performance in computational times.

References

  • [1] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, 2006.
  • [2] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Tran. Inf. Theory, vol. 52, no. 2, pp. 489–509, 2006.
  • [3] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol. 53, no. 12, pp. 4655–4666, 2007.
  • [4] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM J. Sci. Comput., vol. 20, no. 1, pp. 33–61, 1998.
  • [5] R. Chartrand, “Exact reconstruction of sparse signals via nonconvex minimization,” IEEE Signal Process. Lett., vol. 14, no. 10, pp. 707–710, 2007.
  • [6] M. E. Tipping, “Sparse bayesian learning and the relevance vector machine,” J. Mach. Learn. Res., vol. 1, pp. 211–244, 2001.
  • [7] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE Trans. Signal Process., vol. 56, no. 6, pp. 2346–2356, 2008.
  • [8] D. P. Wipf and B. D. Rao, “Sparse bayesian learning for basis selection,” IEEE Trans. Signal Process., vol. 52, no. 8, pp. 2153–2164, 2004.
  • [9] S. D. Babacan, R. Molina, and A. K. Katsaggelos, “Bayesian compressive sensing using laplace priors,” IEEE Trans. Image Process., vol. 19, no. 1, pp. 53–63, 2010.
  • [10] E. W. Weisstein. Normal product distribution. MathWorld–A Wolfram Web Resource. [Online]. Available: http://mathworld.wolfram.com/NormalProductDistribution.html
  • [11] A. SEIJAS-MACÍAS and A. OLIVEIRA, “An approach to distribution of the product of two normal variables.” Discussiones Mathematicae: Probability & Statistics, vol. 32, 2012.
  • [12] D. Wipf, J. Palmer, and B. Rao, “Perspectives on sparse bayesian learning,” Computer Engineering, vol. 16, no. 1, p. 249, 2004.
  • [13]

    D. G. Tzikas, C. Likas, and N. P. Galatsanos, “The variational approximation for bayesian inference,”

    IEEE Signal Process. Mag., vol. 25, no. 6, pp. 131–146, 2008.
  • [14] I. F. Gorodnitsky and B. D. Rao, “Sparse signal reconstruction from limited data using focuss: A re-weighted minimum norm algorithm,” IEEE Trans. Signal Process., vol. 45, no. 3, pp. 600–616, 1997.
  • [15] R. Chartrand and W. Yin, “Iteratively reweighted algorithms for compressive sensing,” in Proc. ICASSP, 2008, pp. 3869–3872.
  • [16] M. E. Tipping, A. C. Faul et al., “Fast marginal likelihood maximisation for sparse bayesian models,” in Proc. 9th Int. Workshop on AIStats, vol. 1, no. 3, 2003.