A Semiparametric Efficient Approach To Label Shift Estimation and Quantification

11/07/2022
by   Brandon Tse Wei Chow, et al.
0

Transfer Learning is an area of statistics and machine learning research that seeks answers to the following question: how do we build successful learning algorithms when the data available for training our model is qualitatively different from the data we hope the model will perform well on? In this thesis, we focus on a specific area of Transfer Learning called label shift, also known as quantification. In quantification, the aforementioned discrepancy is isolated to a shift in the distribution of the response variable. In such a setting, accurately inferring the response variable's new distribution is both an important estimation task in its own right and a crucial step for ensuring that the learning algorithm can adapt to the new data. We make two contributions to this field. First, we present a new procedure called SELSE which estimates the shift in the response variable's distribution. Second, we prove that SELSE is semiparametric efficient among a large family of quantification algorithms, i.e., SELSE's normalized error has the smallest possible asymptotic variance matrix compared to any other algorithm in that family. This family includes nearly all existing algorithms, including ACC/PACC quantifiers and maximum likelihood based quantifiers such as EMQ and MLLS. Empirical experiments reveal that SELSE is competitive with, and in many cases outperforms, existing state-of-the-art quantification methods, and that this improvement is especially large when the number of test samples is far greater than the number of train samples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/10/2020

Robust Classification under Class-Dependent Domain Shift

Investigation of machine learning algorithms robust to changes between t...
research
09/18/2022

Estimating and Explaining Model Performance When Both Covariates and Labels Shift

Deployed machine learning (ML) models often encounter new user data that...
research
03/04/2021

Distribution-free uncertainty quantification for classification under label shift

Trustworthy deployment of ML models requires a proper measure of uncerta...
research
04/22/2020

Quantifying With Only Positive Training Data

Quantification is the research field that studies the task of counting h...
research
03/08/2021

Deep Transfer Learning for Infectious Disease Case Detection Using Electronic Medical Records

During an infectious disease pandemic, it is critical to share electroni...
research
06/07/2023

Label Shift Quantification with Robustness Guarantees via Distribution Feature Matching

Quantification learning deals with the task of estimating the target lab...
research
05/24/2023

Behavior quantification as the missing link between fields: Tools for digital psychiatry and their role in the future of neurobiology

The great behavioral heterogeneity observed between individuals with the...

Please sign up or login with your details

Forgot password? Click here to reset