What is the distribution of the number of unique original items in a bootstrap sample?

02/18/2016
by   Alex F. Mendelson, et al.
0

Sampling with replacement occurs in many settings in machine learning, notably in the bagging ensemble technique and the .632+ validation scheme. The number of unique original items in a bootstrap sample can have an important role in the behaviour of prediction models learned on it. Indeed, there are uncontrived examples where duplicate items have no effect. The purpose of this report is to present the distribution of the number of unique original items in a bootstrap sample clearly and concisely, with a view to enabling other machine learning researchers to understand and control this quantity in existing and future resampling techniques. We describe the key characteristics of this distribution along with the generalisation for the case where items come from distinct categories, as in classification. In both cases we discuss the normal limit, and conduct an empirical investigation to derive a heuristic for when a normal approximation is permissible.

READ FULL TEXT

page 8

page 9

research
11/02/2022

Stability of clinical prediction models developed using statistical or machine learning methods

Clinical prediction models estimate an individual's risk of a particular...
research
10/17/2021

Centroid Approximation for Bootstrap

Bootstrap is a principled and powerful frequentist statistical tool for ...
research
09/02/2016

A heuristic extending the Squarified treemapping algorithm

A heuristic extending the Squarified Treemap technique for the represent...
research
01/30/2019

Exact Bootstrap and Permutation Distribution of Wins and Losses in a Hierarchical Trial

Finkelstein-Schoenfeld, Buyse, Pocock, and other authors have developed ...
research
05/18/2019

A residual-based bootstrap for functional autoregressions

We consider the residual-based or naive bootstrap for functional autoreg...
research
01/18/2017

A Machine Learning Alternative to P-values

This paper presents an alternative approach to p-values in regression se...
research
09/06/2023

Ensemble linear interpolators: The role of ensembling

Interpolators are unstable. For example, the mininum ℓ_2 norm least squa...

Please sign up or login with your details

Forgot password? Click here to reset