Bagging in overparameterized learning: Risk characterization and risk monotonization

10/20/2022
by   Pratik Patil, et al.
0

Bagging is a commonly used ensemble technique in statistics and machine learning to improve the performance of prediction procedures. In this paper, we study the prediction risk of variants of bagged predictors in the proportional asymptotics regime, in which the ratio of the number of features to the number of observations converges to a constant. Specifically, we propose a general strategy to analyze prediction risk under squared error loss of bagged predictors using classical results on simple random sampling. Specializing the strategy, we derive the exact asymptotic risk of the bagged ridge and ridgeless predictors with an arbitrary number of bags under a well-specified linear model with arbitrary feature covariance matrices and signal vectors. Furthermore, we prescribe a generic cross-validation procedure to select the optimal subsample size for bagging and discuss its utility to mitigate the non-monotonic behavior of the limiting risk in the sample size (i.e., double or multiple descents). In demonstrating the proposed procedure for bagged ridge and ridgeless predictors, we thoroughly investigate oracle properties of the optimal subsample size, and provide an in-depth comparison between different bagging variants.

READ FULL TEXT

page 16

page 20

page 25

research
05/25/2022

Mitigating multiple descents: A model-agnostic framework for risk monotonization

Recent empirical and theoretical analyses of several commonly used predi...
research
10/20/2022

Monotonic Risk Relationships under Distribution Shifts for Regularized Risk Minimization

Machine learning systems are often applied to data that is drawn from a ...
research
04/25/2023

Subsample Ridge Ensembles: Equivalences and Generalized Cross-Validation

We study subsampling-based ridge ensembles in the proportional asymptoti...
research
10/10/2019

The Implicit Regularization of Ordinary Least Squares Ensembles

Ensemble methods that average over a collection of independent predictor...
research
01/17/2023

From Risk Prediction to Risk Factors Interpretation. Comparison of Neural Networks and Classical Statistics for Dementia Prediction

It is proposed to investigate the onset of a disease D, based on several...
research
02/27/2023

Extrapolated cross-validation for randomized ensembles

Ensemble methods such as bagging and random forests are ubiquitous in fi...
research
07/06/2023

Learning Curves for Heterogeneous Feature-Subsampled Ridge Ensembles

Feature bagging is a well-established ensembling method which aims to re...

Please sign up or login with your details

Forgot password? Click here to reset