Weighted Empirical Risk Minimization: Sample Selection Bias Correction based on Importance Sampling

02/12/2020
by   Robin Vogel, et al.
0

We consider statistical learning problems, when the distribution P' of the training observations Z'_1, ..., Z'_n differs from the distribution P involved in the risk one seeks to minimize (referred to as the test distribution) but is still defined on the same measurable space as P and dominates it. In the unrealistic case where the likelihood ratio Φ(z)=dP/dP'(z) is known, one may straightforwardly extends the Empirical Risk Minimization (ERM) approach to this specific transfer learning setup using the same idea as that behind Importance Sampling, by minimizing a weighted version of the empirical risk functional computed from the 'biased' training data Z'_i with weights Φ(Z'_i). Although the importance functionΦ(z) is generally unknown in practice, we show that, in various situations frequently encountered in practice, it takes a simple form and can be directly estimated from the Z'_i's and some auxiliary information on the statistical population P. By means of linearization techniques, we then prove that the generalization capacity of the approach aforementioned is preserved when plugging the resulting estimates of the Φ(Z'_i)'s into the weighted empirical risk. Beyond these theoretical guarantees, numerical results provide strong empirical evidence of the relevance of the approach promoted in this article.

READ FULL TEXT
research
06/28/2019

Statistical Learning from Biased Training Samples

With the deluge of digitized information in the Big Data era, massive da...
research
06/05/2019

Empirical Risk Minimization under Random Censorship: Theory and Practice

We consider the classic supervised learning problem, where a continuous ...
research
01/12/2015

Scaling-up Empirical Risk Minimization: Optimization of Incomplete U-statistics

In a wide range of statistical learning problems such as ranking, cluste...
research
02/09/2023

An information-theoretic learning model based on importance sampling

A crucial assumption underlying the most current theory of machine learn...
research
06/03/2021

Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning

Empirical risk minimization (ERM) is the workhorse of machine learning, ...
research
04/15/2022

Transfer Importance Sampling x2013 How Testing Automated Vehicles in Multiple Test Setups Helps With the Bias-Variance Tradeoff

The promise of increased road safety is a key motivator for the developm...
research
11/01/2022

On Medians of (Randomized) Pairwise Means

Tournament procedures, recently introduced in Lugosi Mendelson (2016...

Please sign up or login with your details

Forgot password? Click here to reset