Reducing Sampling Ratios and Increasing Number of Estimates Improve Bagging in Sparse Regression

12/20/2018
by   Luoluo Liu, et al.
Johns Hopkins University
0

Bagging, a powerful ensemble method from machine learning, improves the performance of unstable predictors. Although the power of Bagging has been shown mostly in classification problems, we demonstrate the success of employing Bagging in sparse regression over the baseline method (L1 minimization). The framework employs the generalized version of the original Bagging with various bootstrap ratios. The performance limits associated with different choices of bootstrap sampling ratio L/m and number of estimates K is analyzed theoretically. Simulation shows that the proposed method yields state-of-the-art recovery performance, outperforming L1 minimization and Bolasso in the challenging case of low levels of measurements. A lower L/m ratio (60 of measurements. With the reduced sampling rate, SNR improves over the original Bagging by up to 24 number of estimates K = 30 gives satisfying result, even though increasing K is discovered to always improve or at least maintain the performance.

READ FULL TEXT VIEW PDF
10/08/2018

JOBS: Joint-Sparse Optimization from Bootstrap Samples

Classical signal recovery based on ℓ_1 minimization solves the least squ...
06/11/2018

On oracle-type local recovery guarantees in compressed sensing

We present improved sampling complexity bounds for stable and robust spa...
02/04/2015

Sparse Representation Classification Beyond L1 Minimization and the Subspace Assumption

The sparse representation classifier (SRC) proposed in Wright et al. (20...
10/10/2014

Compressed Sensing With Side Information: Geometrical Interpretation and Performance Bounds

We address the problem of Compressed Sensing (CS) with side information....
10/20/2019

Sparse (group) learning with Lipschitz loss functions: a unified analysis

We study a family of sparse estimators defined as minimizers of some emp...
06/20/2021

On Sampling Top-K Recommendation Evaluation

Recently, Rendle has warned that the use of sampling-based top-k metrics...
11/30/2020

Empirical best prediction of small area bivariate parameters

This paper introduces empirical best predictors of small area bivariate ...

I Introduction

Compressed Sensing (CS) and Sparse Regression studies solving the linear inverse problem in the form of least squares plus a sparsity-promoting penalty term. Formally speaking, a the measurements vector

is generated by , where is the sensing matrix, is the sparse coefficient with very few non-zero entries and is a bounded noise vector. The problem of interest is finding the sparse vector given as well as . Among various choices of sparse regularizers, the norm is the most commonly used. The noiseless case is referred to as Basis Pursuit (BP) whereas the noisy version is known as basis pursuit denoising [1], or least absolute shrinkage and selection operator (LASSO) [2]:

(1)

The performance of minimization in recovering the true sparse solution has been thoroughly investigated in CS literature [3, 4, 5, 6]. CS theory reveals that if the sensing matrix

has good properties, then BP recovers the ground truth and the LASSO solution is close enough to the true solution with high probability 

[3].

Classical sparse regression recovery based on minimization solves the problem with all available measurements. In practice, it is often the case that not all measurements are available or required for recovery. Part of the measurements might be severely corrupted/missing or there are adversarial samples that break down the algorithm. Those issues could lead to the failure of the sparse regression problem.

Bagging procedure [7], an efficient parallel ensemble method, proposed by Leo Breiman, to improve the performance of unstable predictors. In the Bagging procedure, we firstly sample uniformly at random with replacement from all data points, termed bootstrap [8]; repeat the process and generate bootstrap samples with sizes same as ; solve each problem on the bootstrap samples using a baseline algorithm; and then combine multiple predictions to obtain the final prediction.

Applying Bagging to obtaining a sparse vector with a specific symmetric pattern, was shown empirically to reduce estimation error when the sparsity level is high [7]. This experiment shows the possibility of using Bagging to improve other sparse regression methods on general sparse signals. Although the well-known conventional Bagging method uses bootstrap ratio

, some follow-up works have shown empirically that lower ratios improves Bagging in some classic classifiers: Nearest Neighbour Classifier 

[9], CART Trees [10], Linear SVM, LDA and Logistic Linear Classifier [11]. Based on this success, the hypothesize that reducing the bootstrap ratio will also improve performance in the sparse regression problem. Therefore, we set up the framework with a generic bootstrap ratio and study its behavior with various bootstrap ratios.

In this paper, (i) we demonstrate the generalized Bagging framework with bootstrap ratio and number of estimates as parameters. (ii) We explore the theoretical properties associated with finite and . (iii) We present simulation results of various , are explored and compared to minimization, conventional Bagging and Bolasso [12], another modern technique that incorporates Bagging into sparse recovery. Bolasso is a two-step process which recovers the support of the signal first and then the amplitudes. An important discovery is that: when is small, Bagging with a ratio that is smaller than the conventional ratio can leads to better performance in challenging cases with small .

Ii Proposed Method

Ii-a Bagging in Sparse Regression

Our proposed method is a generalized Bagging procedure for sparse recovery. It can be accomplished in three steps. First, we generate bootstrap samples: The multiple bootstrap process generates multi-sets of the original data, each of size , which contains measurements and sensing matrices pairs: . In this paper, we use notation on matrices or vectors to take rows supported on and throw away all other rows in the complement . Second, we solve the sparse recovery in parallel on those sets, for all , find

(2)

The proposed approach is in the form of LASSO and numerous optimization methods can solve it such as [13, 14, 15, 16].

Finally, the Bagging solution is obtained through averaging all solutions from solving (2):

(3)

Compared to the minimization solution which is solved from all the measurements, the bagged solution is obtained by resampling without increasing the number of original measurements. We will show that in some cases, the bagged solution outperforms the base solution.

Iii Preliminaries

We summarize the theoretical results of CS theory which we need to analyze our algorithm mathematically. We are going to introduce Null Space Property (NSP), as well as Restricted Isometry Property (RIP).

Iii-a Null Space Property (NSP)

The NSP [17] for standard sparse recovery characterize the necessary and sufficient condition of successful sparse recovery for minimization.

Theorem 1 (Nsp).

Every sparse signal is a unique solution to if and only if satisfies NSP of order . Namely, if for all , such that for any set of cardinality less than equals to , the following is satisfied:

where only has the vector values on a index set and zero elsewhere.

Iii-B Restricted Isometry Property (RIP)

Although NSP directly characterizes the ability of success for sparse recovery, checking the NSP condition is computationally intractable. It is also not suitable to use NSP for quantifying performance in noisy conditions since it is a binary (True or False) metric instead of a continuous range. The Restricted isometry property (RIP) [3] is introduced for those purposes.

Definition 2 (Rip).

A matrix with -normalized columns satisfies RIP of order if there exists a constant such that for every sparse , we have:

(4)

Iii-C Noisy Recovery bounds based on RIP constants

It is known that RIP conditions imply NSP conditions satisfied for sparse recovery [3]. More specifically, if the RIP constant in the order is strictly less than , then it implies that NSP is satisfied in the order of . The noisy recovery performance for minimization is bounded based on the RIP constant is stated in the following theorem.

Theorem 3 (Noisy recovery for minimization, Theorem 1.2 in [3]).

Let , , is sparse that minimizes over all sparse signals. If , be the solution of minimization, then

where are some constants, which are determined by RIP constant . The form of these two constants terms are and .

Iv Theoretical Results for Bagging associated with sampling ratio and the number of estimates

Iv-a Noisy Recovery for Employing Bagging in Sparse Recovery

We derive the performance bound for employing Bagging in sparse recovery, in which the final estimate is the average over multiple estimates solved individually from bootstrap samples. We give the theoretical results for the case that true signal is exactly sparse and the general case with no assumption of the sparsity level of the ground truth signal.

To prove these two theorems, we combine Theorem 3

and the tail bounds of independent bounded random variables; more details of the proof can be found in 

[18].

Theorem 4 (Bagging: Error bound for ).

Let , , If under the assumption that, for s that generates a set of sensing matrices , there exists a constant that is relates to and : such that for all , . Let be the solution of Bagging, then for any , satisfies

We also study the behavior of Bagging for a general signal , in which the performance involves the sparse approximation error. We use the vector to denote this error, , where is the top approximation of the ground truth signal (containing entries with the largest amplitudes of ).

Theorem 5 (Bagging: Error bound for general signal recovery).

Let , , If under the assumption that, for s that generates a set of sensing matrices , there exists such that for all , . Let be the solution of Bagging, then for any , satisfies

where .

Theorem 5 gives the performance bound for Bagging for general signal recovery without the sparse assumption, and it reduces to Theorem 4 when the sparse approximation error is zero . Theorem 5 can be used to analyze the cases with small .

Both Theorem 4 and 5 above show that increasing the number of estimates improves the result, by increasing the lower bound of certainty of the same performance. As for the sampling ratio , because the RIP constant in general decreases with increasing (proof with Guassian assumption in [19]) and is a non-decreasing function of , a larger in general results in a smaller . The second factor associated with the noise power term, , suggests a smaller .

Combining two factors indicates that the best ratio is in between a small and a large number. In the experiment results, we will show that when is small, varying from creates peaks with the largest value at . The first factor is dominating in the stable case when there are enough measurements, in which a larger leads to better performance.

(a)
(b)
(c)
(d)
Fig. 1: Performance curves for Bagging with various sampling ratios , and different number of estimates , the best performance of Bolasso as well as minimization. The Purple lines highlighted the conventional Bagging with . dB and the number of measurements from left to right. The grey circle highlights the peak of Bagging, and the grey area highlights the bootstrap ratio at the peak point.

V Simulations

In this section, we perform sparse recovery on simulated data to study the performance of our algorithm. In our experiment, all entries of

are i.i.d. samples from the standard normal distribution

. The signal dimension and various numbers of measurements from to are explored. For the ground truth signals, their sparsity levels are all , and the non-zeros entries are sampled from the standard Gaussian with their locations being generated uniformly at random. For the noise processes , which entries are sampled i.i.d. from

, with variance

, where SNR represents the Signal to Noise Ratio. In our experiment, we add white Gaussian noise to make the dB. We use the ADMM [13] implementation of LASSO to solve all sparse regression problems, in which the parameter balances the least squares fit and the sparsity penalty.

We study how the number of estimates as well as the bootstrapping ratio affects the result. In our experiment, we take , while the bootstrap ratio varies from to . We report the Signal to Noise Ratio (SNR) as the error measure for recovery : averaged over independent trials. For all algorithms, we evaluate at different values from to and then select optimal values that gives the maximum averaged SNR over all trials.

V-a Performance of Bagging, Bolasso and minimization

Bagging and Bolasso with the various parameters and minimization are studied. The results are plotted in Figure 1. The colored curves show the cases of Bagging with various number of estimates . The intersections of colored curves and the purple solid vertical lines at illustrates conventional Bagging with full bootstrap rate. The grey circle highlights the best performance and the grey area highlights the optimal bootstrap ratio . The performance of minimization is depicted by the black dashed lines, while the best Bolasso performance is plotted using light green dashed lines. In those figures, for each condition with a choice of , the information available to Bagging and Bolasso algorithms is identical, and minimization always has access to all measurements.

From Figure 1, we see that when is small, Bagging can outperform minimization. As decreases, the margin increases. The important observation is that with a low number of measurements ( is between to : , is the sparsity level), and a reduced bootstrap ratio ( ), Bagging beats the conventional choice of full bootstrap ratio for all different choices of . Also with a reduced ratio and a small our algorithm is already quite robust and outperforms minimization by a large margin. When the number of measurements is moderate , Bagging still beats the baseline; however, the peaks take at full bootstrapping ratio and reduced bootstrap ratios does not gain more benefits. Increasing the level measurement makes the base algorithm more stable and the advantage of Bagging starts decaying.

We perform the same experiments with more number of measurements and Table I illustrates the best performance for various schemes: minimization, the original Bagging scheme with full bootstrap ratio, Bagging, and Bolasso with dB. For Bagging, the peak values are found among different choices of parameters and that we explored. We see that when the number of measurements is small (), Bagging outperforms minimization. The reduced bootstrap rate also improves the conventional Bagging: the improvement is significant: on SNR when . When is moderate (), the reduced rate does not improve the performance compared to conventional Bagging. Bagging still outperform minimization with smaller margins than the cases with small . While is large (), Bagging starts losing its advantage over minimization.

Bolasso only performs similarly to other algorithms for an extremely large () where it slightly outperforms all other algorithms . Bolasso only behaves in easy cases. When the noise level is high, it may easily lead to too sparse solutions. The reason is that the supports of different estimators may vary too much and then the size of the common support among all estimators, used in Bolasso algorithm as the recovered support, can shrink dramatically.

Small Moderate Large Very large
The number of measurements 50 75 100 125 150 175 200 500 1000 2000
min. 0.12 0.57 1.00 1.70 2.19 2.61 2.97 6.53 9.46 12.55
Original Bagging (L/m=1) 0.45 0.94 1.29 1.86 2.29 2.70 3.01 6.22 9.06 12.10
Bagging 0.56 0.95 1.32 1.86 2.29 2.70 3.01 6.22 9.06 12.10
Bolasso 0.02 0.09 0.08 0.28 0.57 0.98 1.23 5.21 8.94 12.73
TABLE I: The performance of minimization and the best performance among all choices of and with various number of measurements for Bagging, Bolasso. SNR

Vi Conclusion

We extend the conventional Bagging scheme in sparse recovery with the bootstrap sampling ratio as adjustable parameters, and derive error bounds for the algorithm associated with and the number of estimates . Bagging is particularly powerful when the number of measurements is small. This condition is notoriously difficult, both in terms of improving sparse recovery results and obtaining tight bounds of theoretical properties. Despite these challenges, Bagging outperforms minimization by a large margin and the reduced sampling rate has a larger margin over the conventional Bagging algorithm . When the number of measurements is , where is the sparsity level, the conventional Bagging achieves and the generalized Bagging achieves SNR improvement over the original minimization with reduced sampling rate. Our Bagging scheme achieves acceptable performance even with very small (around ) and relative small (around in our experimental study). The error bounds for Bagging show that increasing will improve the certainty of the bound, which is validated in the simulation. For a parallel system that allows a large number of processes to be run at the same time, a large is preferred since it in general gives a better result.

References

  • [1] Scott Shaobing Chen, David L Donoho, and Michael A Saunders. Atomic decomposition by basis pursuit. SIAM review, 43(1):129–159, 2001.
  • [2] Robert Tibshirani. Regression shrinkage and selection via the lasso. J. of the Royal Stat. Society. Series B, pages 267–288, 1996.
  • [3] Emmanuel J Candes. The restricted isometry property and its implications for compressed sensing. Comptes Rendus Mathematique, 346(9):589–592, 2008.
  • [4] Emmanuel J Candes, Justin Romberg, and Terence Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. on info. theory, 52(2):489–509, 2006.
  • [5] David L Donoho. Compressed sensing. IEEE Trans. on info. theory, 52(4):1289–1306, 2006.
  • [6] Emmanuel Candess and Justin Romberg. Sparsity and incoherence in compressive sampling. Inverse prob., 23(3):969, 2007.
  • [7] Leo Breiman. Bagging predictors. Machine learning, 24(2):123–140, 1996.
  • [8] Bradley Efron. Bootstrap methods: another look at the jackknife. The Annals of Stat., 7(1):1–26, 1979.
  • [9] Peter Hall and Richard J Samworth. Properties of bagged nearest neighbour classifiers. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(3):363–379, 2005.
  • [10] Maryam Sabzevari, Gonzalo Martinez-Munoz, and Alberto Suarez. Improving the robustness of bagging with reduced sampling size.

    European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, 2014.

  • [11] Faisal Zaman and Hideo Hirose. Effect of subsampling rate on subbagging and related ensembles of stable classifiers. In

    International Conference on Pattern Recognition and Machine Intelligence

    , pages 44–49. Springer, 2009.
  • [12] Francis R Bach. Bolasso: model consistent lasso estimation through the bootstrap. In Proceedings of the 25th int. conf. on Machine learning, pages 33–40. ACM, 2008.
  • [13] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1–122, 2011.
  • [14] E. van den Berg and M. P. Friedlander. Probing the pareto frontier for basis pursuit solutions. SIAM J. on Scientific Computing, 31(2):890–912, 2008.
  • [15] Stephen J Wright, Robert D Nowak, and Mario AT Figueiredo. Sparse reconstruction by separable approximation. IEEE Trans. on Sig. Proc., 57(7):2479–2493, 2009.
  • [16] J. Liu, S. Ji, and J. Ye. SLEP: Sparse Learning with Efficient Projections. Arizona State University, 2009.
  • [17] Albert Cohen, Wolfgang Dahmen, and Ronald DeVore. Compressed sensing and best -term approximation. Journal of the American mathematical society, 22(1):211–231, 2009.
  • [18] Luoluo Liu, Sang Peter Chin, and Trac D. Tran. JOBS: Joint-sparse optimization from bootstrap samples. arXiv preprint arXiv:1810.03743, 2018.
  • [19] Richard Baraniuk, Mark Davenport, Ronald DeVore, and Michael Wakin. A simple proof of the restricted isometry property for random matrices. Constructive Approx., 28(3):253–263, 2008.