Domain Adaptation under Missingness Shift

11/03/2022
by   Helen Zhou, et al.
0

Rates of missing data often depend on record-keeping policies and thus may change across times and locations, even when the underlying features are comparatively stable. In this paper, we introduce the problem of Domain Adaptation under Missingness Shift (DAMS). Here, (labeled) source data and (unlabeled) target data would be exchangeable but for different missing data mechanisms. We show that when missing data indicators are available, DAMS can reduce to covariate shift. Focusing on the setting where missing data indicators are absent, we establish the following theoretical results for underreporting completely at random: (i) covariate shift is violated (adaptation is required); (ii) the optimal source predictor can perform worse on the target domain than a constant one; (iii) the optimal target predictor can be identified, even when the missingness rates themselves are not; and (iv) for linear models, a simple analytic adjustment yields consistent estimates of the optimal target parameters. In experiments on synthetic and semi-synthetic data, we demonstrate the promise of our methods when assumptions hold. Finally, we discuss a rich family of future extensions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/21/2022

Adapting to Latent Subgroup Shifts via Concepts and Proxies

We address the problem of unsupervised domain adaptation when the source...
research
02/28/2023

Federated Covariate Shift Adaptation for Missing Target Output Values

The most recent multi-source covariate shift algorithm is an efficient h...
research
02/06/2019

Weak consistency of the 1-nearest neighbor measure with applications to missing data and covariate shift

When data is partially missing at random, imputation and importance weig...
research
09/16/2021

Unsupervised domain adaptation with non-stochastic missing data

We consider unsupervised domain adaptation (UDA) for classification prob...
research
09/19/2023

Prominent Roles of Conditionally Invariant Components in Domain Adaptation: Theory and Algorithms

Domain adaptation (DA) is a statistical learning problem that arises whe...
research
03/02/2022

Estimating Conditional Average Treatment Effects with Missing Treatment Information

Estimating conditional average treatment effects (CATE) is challenging, ...
research
02/03/2020

Linear predictor on linearly-generated data with missing values: non consistency and solutions

We consider building predictors when the data have missing values. We st...

Please sign up or login with your details

Forgot password? Click here to reset