Doubly Flexible Estimation under Label Shift

07/09/2023
by   Seong-Ho Lee, et al.
0

In studies ranging from clinical medicine to policy research, complete data are usually available from a population 𝒫, but the quantity of interest is often sought for a related but different population 𝒬 which only has partial data. In this paper, we consider the setting that both outcome Y and covariate X are available from 𝒫 whereas only X is available from 𝒬, under the so-called label shift assumption, i.e., the conditional distribution of X given Y remains the same across the two populations. To estimate the parameter of interest in 𝒬 via leveraging the information from 𝒫, the following three ingredients are essential: (a) the common conditional distribution of X given Y, (b) the regression model of Y given X in 𝒫, and (c) the density ratio of Y between the two populations. We propose an estimation procedure that only needs standard nonparametric technique to approximate the conditional expectations with respect to (a), while by no means needs an estimate or model for (b) or (c); i.e., doubly flexible to the possible model misspecifications of both (b) and (c). This is conceptually different from the well-known doubly robust estimation in that, double robustness allows at most one model to be misspecified whereas our proposal can allow both (b) and (c) to be misspecified. This is of particular interest in our setting because estimating (c) is difficult, if not impossible, by virtue of the absence of the Y-data in 𝒬. Furthermore, even though the estimation of (b) is sometimes off-the-shelf, it can face curse of dimensionality or computational challenges. We develop the large sample theory for the proposed estimator, and examine its finite-sample performance through simulation studies as well as an application to the MIMIC-III database.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2022

Semi-supervised Triply Robust Inductive Transfer Learning

In this work, we propose a semi-supervised triply robust inductive trans...
research
06/28/2023

Efficient and Multiply Robust Risk Estimation under General Forms of Dataset Shift

Statistical machine learning methods often face the challenge of limited...
research
01/17/2022

Targeted Optimal Treatment Regime Learning Using Summary Statistics

Personalized decision-making, aiming to derive optimal individualized tr...
research
06/13/2023

Sensitivity analysis for studies transporting prediction models

We consider the estimation of measures of model performance in a target ...
research
07/08/2019

A Versatile Estimation Procedure without Estimating the Nonignorable Missingness Mechanism

We consider the estimation problem in a regression setting where the out...
research
07/11/2018

Quantification under prior probability shift: the ratio estimator and its extensions

The quantification problem consists of determining the prevalence of a g...
research
03/17/2020

A Unified View of Label Shift Estimation

Label shift describes the setting where although the label distribution ...

Please sign up or login with your details

Forgot password? Click here to reset