Why resampling outperforms reweighting for correcting sampling bias

09/28/2020
by   Jing An, et al.
0

A data set sampled from a certain population is biased if the subgroups of the population are sampled at proportions that are significantly different from their underlying proportions. Training machine learning models on biased data sets requires correction techniques to compensate for potential biases. We consider two commonly-used techniques, resampling and reweighting, that rebalance the proportions of the subgroups to maintain the desired objective function. Though statistically equivalent, it has been observed that reweighting outperforms resampling when combined with stochastic gradient algorithms. By analyzing illustrative examples, we explain the reason behind this phenomenon using tools from dynamical stability and stochastic asymptotics. We also present experiments from regression, classification, and off-policy prediction to demonstrate that this is a general phenomenon. We argue that it is imperative to consider the objective function design and the optimization algorithm together while addressing the sampling bias.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2019

Effect Inference from Two-Group Data with Sampling Bias

In many applications, different populations are compared using data that...
research
10/07/2019

Learning De-biased Representations with Biased Representations

Many machine learning algorithms are trained and evaluated by splitting ...
research
11/14/2022

Interpreting Bias in the Neural Networks: A Peek Into Representational Similarity

Neural networks trained on standard image classification data sets are s...
research
02/13/2023

Provable Detection of Propagating Sampling Bias in Prediction Models

With an increased focus on incorporating fairness in machine learning mo...
research
12/05/2018

Uncertainty Sampling is Preconditioned Stochastic Gradient Descent on Zero-One Loss

Uncertainty sampling, a popular active learning algorithm, is used to re...
research
11/26/2022

Maximizing the Probability of Fixation in the Positional Voter Model

The Voter model is a well-studied stochastic process that models the inv...
research
12/18/2020

Multi-characteristic Subject Selection from Biased Datasets

Subject selection plays a critical role in experimental studies, especia...

Please sign up or login with your details

Forgot password? Click here to reset