Bayesian Approach with Extended Support Estimation for Sparse Regression

04/03/2019 ∙ by Kyung-Su Kim, et al. ∙ KAIST 수리과학과 0

A greedy algorithm called Bayesian multiple matching pursuit (BMMP) is proposed to estimate a sparse signal vector and its support given m linear measurements. Unlike the maximum a posteriori (MAP) support detection, which was proposed by Lee to estimate the support by selecting an index with the maximum likelihood ratio of the correlation given by a normalized version of the orthogonal matching pursuit (OMP), the proposed method uses the correlation given by the matching pursuit proposed by Davies and Eldar. BMMP exploits the diversity gain to estimate the support by considering multiple support candidates, each of which is obtained by iteratively selecting an index set with a size different for each candidate. In particular, BMMP considers an extended support estimate whose maximal size is m in the process to obtain each of the support candidates. It is observed that BMMP outperforms other state-of-the-art methods and approaches the ideal limit of the signal sparsity in our simulation setting.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Sparse linear regression, refered to as compressed sensing (CS) [1, 2], has been widely studied in many fields ranging from denoising [3]

to super-resolution restoration

[4, 5]. A signal vector is called -sparse if it has at most non-zero elements. Then, the objective of sparse linear regression is to recover a -sparse signal vector from the given measurement vector , such that

where is a known sensing matrix with and a noise vector .

To improve recovery performance of the sparse signal and its support , the set of non-zero indices in , a method called MAP support detection [6] to recover a partial support of was proposed by exploiting the maximum a posteriori (MAP) estimation. This method determines whether the index belongs to by using a likelihood ratio test under the true and null hypotheses given a correlation between the residual vector and its -normalized column vector in . By applying this method to existing algorithms such as generalized OMP (gOMP) [7], compressive sampling matching pursuit (CoSaMP) [8], and subspace pursuit (SP) [9], some greedy methods were proposed and shown to outperform other existing methods[6].

We propose a new MAP-based method to estimate partial support by modifying the MAP support detection to use the correlation term given in rank aware order recursive matching pursuit (RA-ORMP) [10]. This proposed scheme improves the existing MAP support detection by additionally utilizing an orthogonal complement onto each column vector in to calculate the correlation, since the correlation can be better represented through this consideration.

Based on this method to estimate a partial support, we develop an algorithm called Bayesian multiple matching pursuit (BMMP) to recover and its support . BMMP also uses the following four techniques to enhance the performance further. Simulation results show that BMMP outperforms other state-of-the-art methods in both noiseless and noisy cases.

  1. (Generating an extended support estimate of size bounded by ) The extended support is defined by an index set whose size is larger than the actual sparsity of and includes . gOMP [7], SP [9], and CoSaMP [8] exploited an extended support estimate in the process to estimate . To improve the support recovery performance, the maximum size of the extended support estimate in BMMP is set to while the maximum size of the extended support estimate in gOMP, SP, and CoSaMP, is , , and , respectively, for a constant smaller than . The reason for setting this to in BMMP is explained in the section describing BMMP.

  2. (Generating multiple support candidates) BMMP utilizes the diversity gain by iteratively selecting multiple indices of different size to generate each of multiple support candidates. In other words, BMMP generates the th support candidate by iteratively selecting indices through the proposed partial support estimation. There have been related studies generating multiple support candidates through a tree structure: multipath matching pursuit [11], multi-branch matching pursuit [12]. However, the complexity of the tree-based methods scales in the sparsity and is higher than those of other CS algorithms, since the number of support candidates increases exponentially with (the depth of the tree) due to the structural nature of the tree. In contrast with these methods, BMMP does not use the tree-based approach to reduce complexity so that the number of support candidates is independent of and the complexity is linearly increased with in BMMP.

  3. (Updating extended support estimate by replacing its subset) To improve the support recovery performance, SP [9] and CoSaMP [8] update an extended support estimate by selecting indices and replacing the rest of the selected indices in at each iteration. BMMP modifies this technique by iteratively selecting indices in a given extended support estimate and updating its complement in .

  4. A ridge regression proposed by Wipf and Rao

    [13] is used to improve robustness to noise.

Ii Notation

The set is denoted by . The submatrix of a matrix , where is its th column, with columns indexed by is denoted by . denotes the submatrix of with rows indexed by . denotes the range space spanned by the columns of . denotes the transpose of . and denote the orthogonal projection onto and its orthogonal complement, respectively. denotes the Frobenius norm. For a set and a space of , , and . Similarly, .

Iii Partial support estimation by MAP

The proposed MAP-based algorithm for estimating a partial true support is introduced in this section. We assume the following.

is uniformly distributed on

. Each nonzero element of is i.i.d and follows an arbitrary distribution

whose mean and variance are

and , respectively. Each element in and

follows the Gaussian distribution whose mean is

and variance is and , respectively.

Suppose that such that is a given partial support estimate. The goal of the proposed partial support detection is to find an index set belonging to by using the inner product used in RA-ORMP, for , where is the residual vector, and . represents the correlation between the residual vector and .

Although MAP support detection [6] estimates a partial true support from the likelihood ratio of the inner product , where , the proposed method uses the likelihood ratio of for . Note that and use and , respectively. can be interpreted as the correlation term in a normalized version of OMP since the normalized vector is used in , whereas OMP uses in its correlation term . This indicates that the proposed method additionally considers the orthogonal complement of for the column vector to represent the correlation compared to MAP support detection. Since each vector space for is guaranteed to belong to a space , which is extended from the residual space , the index selection based on may more successfully find the index in compared to that based on .111It is not guaranteed that each vector space for belongs to the extended residual space .

The proposed partial support detection uses an estimate of . This is obtained by the following derivations.

where , (a) follows from the independence assumption, and (b) follows from Lemma A.1 (with and or ) in that and . Then, the estimate for is given as

(1)

Note that the following equalities hold for

(2)

where is the th element of and (a) follows from . Our derivation in the rest of this section is based on [6]. From the equalities in (2), becomes the right-side terms in (3) and (4) under the null and true hypotheses— and , respectively. For , does not belong to , and . For , belongs to , and .

(3)
(4)

From (2) and (3), the expectation and variance of

under the null hypothesis can be respectively approximated as follows:

(5)
(6)

where (a) and (b) follow from the circular symmetry of the Gaussian distribution and (1), respectively. Similarly, by (4) and Lemma A.1 (with and ), the expectation and variance of under the true hypothesis are obtained respectively as

(7)

and

(8)

where , and is the Gamma function.

Then, using (5)–(8), we obtain the log-likelihood ratio as follows.

(9)

where (a) follows from the Gaussian approximation of the conditional distributions of given () by using (5) and (6) and given () by using (7) and (8).

Therefore, the proposed method estimates a partial true support as an index set by selecting the largest ratios for in . Note that is propotional to (10) when each nonzero element of follows the uniform distribution for

(10)

where .

Iv Algorithm description

1:, , ,
2:
3:for   do
4:     , where is the empty set.
5:     while  or  do
6:         while  and  do
7:              
8:              
9:              
10:              
11:              
12:         end while
13:          where is obtained by
14:         for  do
15:              , where
16:         end for
17:         
18:         
19:         
20:     end while
21:     
22:     if  then go to step 23
23:     end if
24:end for
25:
26:return
Algorithm 1 BMMP()
Fig. 1: Description of BMMP:

is the support estimate obtained from BMMP and is the number of multiple support candidates among which is selected. We note that the th support candidate is obtained by selecting incides from an extended support estimate whose size is smaller than or equal to , which is generated by iteratively selecting indices of the -largest log-likelihood ratios for and adding them to .

The proposed method, BMMP (Algorithm 1), for estimating and is introduced in this section. BMMP returns () as estimates of () given the tuple (), where is the number of support candidates from which the final support is estimated, and is a threshold used to terminate BMMP when the residual error is smaller than this value. The estimate is obtained in Steps 23–24 by selecting one of support candidates ( for ) with the minimum residual error. The th support candidate in Step 23 is obtained by the iteration in Steps consisting of the following four stages.

In the first stage (generate an extended support estimate ) in Steps 4–10, a set of indices in Step 8 is iteratively obtained through the proposed partial support estimation222We note that in (10) is calculated by using the following values: . Step 6 in Algorithm 1 specifies these values as inputs of . and added to a set in Step 9 until its size is equal to or the residual error obtained from the estimate is smaller than , i.e. . In the second stage (generate a temporary estimate of ) in Step 11, an estimate of the signal vector supported on is obtained by the ridge regression described in Step 11. In the third stage in Steps 12–15, a temporary support estimate is obtained by selecting the -largest elements in . The fourth stage in Steps 16–17 generates a subset of by selecting indices from . Then, the extended support estimate is reinitialized as the subset in the next iteration step. This iteration terminates and the th support candidate is obtained in Step 19 when the residual error obtained from the temporary support estimate in Step 15 does not decrease further in Step 3.

To enhance the robustness to noise, a ridge regression proposed by Wipf and Rao [13] is used in Step 11 with the following parameters obtained by minimizing the cost in (11).

(11)

where is a submatrix of with columns indexed by , , and is the diagonal matrix whose th diagonal element is in . We obtain an appoximated solution of by using sparse bayesian learning (SBL) [13] to reduce the complexity.

In the noiseless case, the parameters in the ridge regression are set to zero so that the ridge regression becomes the least-square regression (). The least-squares method has a lower complexity than SBL and it is well-known that the least-squares method provides a unique solution as in the noiseless case if , , and has the full column rank. If holds, the inversion problem of CS can be simplified as an easier problem where is replaced by its submatrix compared to the original problem. When

has a larger size, the probability of satisfying

is greater. For this reason, we set the maximal size of the extended support estimate as . Besides, it is guaranteed from Lemma A.2 that in the noiseless case, is a subset of () if and is equal to zero. Based on this fact, to minimize the size of satisfying , we increase the size of from to and find satisfying the criterion in Steps 4–10. Figure 1 illustrates the procedure of BMMP.

(a) Support recovery rate
(b) Execution time
(c) Mean squared error
(d) Support recovery rate
Fig. 2: Performance comparison of BMMP and related algorithms
(a) Support recovery rate
Fig. 3: Performance comparison of the proposed partial support detection and the MAP support detection

V Numerical experiments

We compare the performance of BMMP and existing state-of-the-art algorithms such as MAP-gOMP333Two indices are selected at each step in MAP-gOMP., MAP-SP, MAP-CoSaMP, MAP-OMP [6], RegL1 ( norm minimization with a nonnegative constraint).444RegL1 outputs the signal estimate by minimizing such that for and where is the noise magnitude. We note that MAP-SP, MAP-CoSaMP, MAP-OMP are algorithms based on MAP support detection [6]. Each entry of is i.i.d. and follows the Gaussian distribution with mean zero and variance . The elements of are independently and uniformly sampled from to . Then, is .

Fig. 4: Reconstruction performance comparison of a sparse image with compressed and noisy measurements.

The rate of successful support recovery, i.e., , and the execution time of each algorithms in the noiseless case are shown in Fig. 2(a) and Fig. 2(b), respectively. () is set to () in Fig. 2. We evaluate BMMP whose input is set to , where SNR is the signal-to-noise ratio (SNR ) in decibels. It has been shown [14] that in the noiseless case, the maximal sparsity required for an ideal approach to recover any signal vector such that is equal to given , where is the support of ; the bound is denoted by this ideal limit of the signal sparsity (). Fig. 2(a) and Fig. 2(b) show that BMMP has a lower complexity than RegL1, outperforms other state-of-the-art methods, and approaches the bound, i.e., .

Fig. 2(c) illustrates the mean squared error of each algorithm when SNR is varied from to dB555In the noisy case, is randomly sampled such that each of its non-zero elements is ranged from to ., the sparsity is fixed to . It is observed that the signal reconstruction performance of BMMP is better than those of other methods in the noisy case.

Fig. 2(d) shows how each of the following three techniques used in BMMP contributes to the performance improvement of BMMP: exploiting the extended support estimate with its maximal size equal to (), using multiple support estimates (), and iteratively updating the support estimate by replacing its subset (). BMMP refers to BMMP without by setting the value to in step 16 in Algorithm 1. BMMP denotes BMMP without by setting to in BMMP. Similarly, BMMP indicates BMMP without such that (the maximum size of the extended support estimate ) shown in steps 4-10 in Algorithm 1 is set to and the remaining settings follow BMMP. It is shown that the performance improves in the order of BMMP, BMMP, BMMP, and BMMP. Fig. 2(d) shows that the consideration of contributes to a significant performance improvement of BMMP.

To compare performance of the proposed partial support estimation and MAP support detection [6], we compare performance of MAP-gOMP and its variant, which is obtained by replacing the MAP support detection with the proposed partial support estimation in MAP-gOMP. Fig. 3 shows the rate of successful support recovery of these two algorithms in the noiseless case where is varied from to with and . It is observed that the proposed partial support estimation outperforms the MAP support detection [6].

Fig. 4 shows the performance for reconstructing a grayscale sparse image with a size of pixels in the noisy case when SNR = dB. This image is compressed by using , whose elements are i.i.d. and sampled from the Gaussian distribution with mean and variance . It is observed in Fig. 4 that BMMP recovers the image better than other methods.

Vi Conclusion

We presented BMMP, which updates multiple extended support estimates with each size equal to

and performs a likelihood ratio test given the correlation term in RA-ORMP. The numerical results show that BMMP achieves an improvement in performance compared to existing state-of-the-art methods, even with noisy measurements. Future related studies will consider the application of a technique for estimating the moments of

, , and to BMMP and the development of BMMP in the case where the distribution of is designable or the a priori information of is available, i.e., a communication system for nonorthogonal multiple access [15, 16].

Appendix A Lemmas

Lemma A.1.

Let be a vector such that its elements are i.i.d. and follow the Gaussian distribution , and let be a matrix with rank . Then, it is guaranteed that and , where , and is the Gamma function.

Proof.

Let be an orthonormal basis of . Then, . Note that each for follows by the isotropic property of the Gaussian vector [17]. Since it follows that

follows the chi distribution with degrees of freedom. Then, the proof is completed by averaging the chi and chi-squared distributions. ∎

Lemma A.2.

Suppose that every columns in exhibit full rank, is uniformly sampled from , and the sparsity is smaller than , i.e., . Then, in the noiseless case, the following statements (a) and (b) hold for any set such that .

  1. holds if .

  2. holds almost surely if .

Proof.

The following subspace is defined as

(12)

For any set such that , holds if . If , includes so that . Thus, implies that so that the statement (a) is satisfied.

Given that from the assumption for , the rank of is strictly larger than that of for any index set such that and , the event region satisfying has Lebesgue measure zero on the range space . Thus, the condition holds almost surely. Given that the condition implies that , holds almost surely. And this condition implies that from the definition of . Thus, the statement (b) is satisfied. ∎

References

  • [1] E. J. Candès et al., “Compressive sampling,” in Proceedings of the International Congress of Mathematicians, vol. 3.   Madrid, Spain, 2006, pp. 1433–1452.
  • [2] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, 2006.
  • [3] C. A. Metzler, A. Maleki, and R. G. Baraniuk, “From denoising to compressed sensing,” IEEE Transactions on Information Theory, vol. 62, no. 9, pp. 5117–5144, 2016.
  • [4] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE Transactions on Image Processing, vol. 19, no. 11, pp. 2861–2873, 2010.
  • [5] R. Heckel, V. I. Morgenshtern, and M. Soltanolkotabi, “Super-resolution radar,” Journal of the Institute of Mathematics and its Applications (IMA), vol. 5, no. 1, pp. 22–75, 2016.
  • [6] N. Lee, “MAP support detection for greedy sparse signal recovery algorithms in compressive sensing,” IEEE Transactions on Signal Processing, vol. 64, no. 19, pp. 4987–4999, 2016.
  • [7] J. Wang, S. Kwon, and B. Shim, “Generalized orthogonal matching pursuit,” IEEE Transactions on Signal Processing, vol. 60, no. 12, pp. 6202–6216, 2012.
  • [8] D. Needell and J. A. Tropp, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,” Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 301–321, 2009.
  • [9] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensing signal reconstruction,” IEEE transactions on Information Theory, vol. 55, no. 5, pp. 2230–2249, 2009.
  • [10] M. E. Davies and Y. C. Eldar, “Rank awareness in joint sparse recovery,” IEEE Transactions on Information Theory, vol. 58, no. 2, pp. 1135–1146, 2012.
  • [11] S. Kwon, J. Wang, and B. Shim, “Multipath matching pursuit,” IEEE Transactions on Information Theory, vol. 60, no. 5, pp. 2986–3001, 2014.
  • [12] M. Rossi, A. M. Haimovich, and Y. C. Eldar, “Spatial compressive sensing in MIMO radar with random arrays,” in 46th Annual Conference on Information Sciences and Systems (CISS), 2012, pp. 1–6.
  • [13] D. P. Wipf and B. D. Rao, “Sparse Bayesian learning for basis selection,” IEEE Transactions on Signal Processing, vol. 52, no. 8, pp. 2153–2164, 2004.
  • [14] S. Foucart and H. Rauhut, A mathematical introduction to compressive sensing.   Springer, 2013.
  • [15] B. Wang, L. Dai, T. Mir, and Z. Wang, “Joint user activity and data detection based on structured compressive sensing for NOMA,” IEEE Communications Letters, vol. 20, no. 7, pp. 1473–1476, 2016.
  • [16] A. C. Cirik, N. M. Balasubramanya, and L. Lampe, “Multi-user detection using ADMM-based compressive sensing for uplink grant-free NOMA,” IEEE Wireless Communications Letters, 2017.
  • [17] D. Tse and P. Viswanath, Fundamentals of wireless communication.   Cambridge university press, 2005.