Learning from a Biased Sample

09/05/2022
by   Roshni Sahoo, et al.
1

The empirical risk minimization approach to data-driven decision making assumes that we can learn a decision rule from training data drawn under the same conditions as the ones we want to deploy it under. However, in a number of settings, we may be concerned that our training sample is biased, and that some groups (characterized by either observable or unobservable attributes) may be under- or over-represented relative to the general population; and in this setting empirical risk minimization over the training set may fail to yield rules that perform well at deployment. Building on concepts from distributionally robust optimization and sensitivity analysis, we propose a method for learning a decision rule that minimizes the worst-case risk incurred under a family of test distributions whose conditional distributions of outcomes Y given covariates X differ from the conditional training distribution by at most a constant factor, and whose covariate distributions are absolutely continuous with respect to the covariate distribution of the training data. We apply a result of Rockafellar and Uryasev to show that this problem is equivalent to an augmented convex risk minimization problem. We give statistical guarantees for learning a robust model using the method of sieves and propose a deep learning algorithm whose loss function captures our robustness target. We empirically validate our proposed method in simulations and a case study with the MIMIC-III dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2023

Robust Generalization against Photon-Limited Corruptions via Worst-Case Sharpness Minimization

Robust generalization aims to tackle the most challenging data distribut...
research
06/14/2021

Examining and Combating Spurious Features under Distribution Shift

A central goal of machine learning is to learn robust representations th...
research
07/08/2020

A One-step Approach to Covariate Shift Adaptation

A default assumption in many machine learning scenarios is that the trai...
research
06/26/2020

Learning Optimal Distributionally Robust Individualized Treatment Rules

Recent development in the data-driven decision science has seen great ad...
research
06/13/2023

Learning under Selective Labels with Data from Heterogeneous Decision-makers: An Instrumental Variable Approach

We study the problem of learning with selectively labeled data, which ar...
research
04/26/2023

Data-driven Piecewise Affine Decision Rules for Stochastic Programming with Covariate Information

Focusing on stochastic programming (SP) with covariate information, this...
research
11/27/2017

Bootstrap Robust Prescriptive Analytics

We address the problem of prescribing an optimal decision in a framework...

Please sign up or login with your details

Forgot password? Click here to reset