A Characterization of Mean Squared Error for Estimator with Bagging

08/07/2019
by   Martin Mihelich, et al.
6

Bagging can significantly improve the generalization performance of unstable machine learning algorithms such as trees or neural networks. Though bagging is now widely used in practice and many empirical studies have explored its behavior, we still know little about the theoretical properties of bagged predictions. In this paper, we theoretically investigate how the bagging method can reduce the Mean Squared Error (MSE) when applied on a statistical estimator. First, we prove that for any estimator, increasing the number of bagged estimators N in the average can only reduce the MSE. This intuitive result, observed empirically and discussed in the literature, has not yet been rigorously proved. Second, we focus on the standard estimator of variance called unbiased sample variance and we develop an exact analytical expression of the MSE for this estimator with bagging. This allows us to rigorously discuss the number of iterations N and the batch size m of the bagging method. From this expression, we state that only if the kurtosis of the distribution is greater than 3/2, the MSE of the variance estimator can be reduced with bagging. This result is important because it demonstrates that for distribution with low kurtosis, bagging can only deteriorate the performance of a statistical prediction. Finally, we propose a novel general-purpose algorithm to estimate with high precision the variance of a sample.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/22/2022

Widely-Linear MMSE Estimation of Complex-Valued Graph Signals

In this paper, we consider the problem of recovering random graph signal...
research
06/07/2023

A note on the optimum allocation of resources to follow up unit nonrespondents in probability

Common practice to address nonresponse in probability surveys in Nationa...
research
10/19/2021

A Bayesian Approach for the Variance of Fine Stratification

Fine stratification is a popular design as it permits the stratification...
research
07/07/2020

Estimation and Inference with Trees and Forests in High Dimensions

We analyze the finite sample mean squared error (MSE) performance of reg...
research
11/20/2018

Optimal Estimation with Complete Subsets of Instruments

In this paper we propose a two-stage least squares (2SLS) estimator whos...
research
03/28/2023

On Feature Scaling of Recursive Feature Machines

In this technical report, we explore the behavior of Recursive Feature M...
research
09/11/2021

Relaxed Zero-Forcing Beamformer under Temporally-Correlated Interference

The relaxed zero-forcing (RZF) beamformer is a quadratically-and-linearl...

Please sign up or login with your details

Forgot password? Click here to reset