Generalized Bayesian Quantification Learning

01/15/2020
by   Jacob Fiksel, et al.
0

Quantification Learning is the task of prevalence estimation for a test population using predictions from a classifier trained on a different population. Commonly used quantification methods either assume perfect sensitivity and specificity of the classifier, or use the training data to both train the classifier and also estimate its misclassification rates. These methods are inappropriate in the presence of dataset shift, when the misclassification rates in the training population are not representative of those for the test population. A recent Bayesian quantification model addresses dataset shift, but only allows for single-class (categorical) predictions, and assumes perfect knowledge of the true labels on a small number of instances from the test population. We propose a generalized Bayesian quantification learning (GBQL) approach that uses the entire compositional predictions from probabilistic classifiers and allows for uncertainty in true class labels for the limited labeled test data. We use a model-free Bayesian estimating equation approach to compositional data using Kullback-Liebler loss-functions based only on a first-moment assumption. This estimating equation approach coherently links the loss-functions for labeled and unlabeled test cases. We show how our method yields existing quantification approaches as special cases through different prior choices thereby providing an inferential framework around these approaches. Extension to an ensemble GBQL that uses predictions from multiple classifiers yielding inference robust to inclusion of a poor classifier is discussed. We outline a fast and efficient Gibbs sampler using a rounding and coarsening approximation to the loss functions. For large sample settings, we establish posterior consistency of GBQL. Empirical performance of GBQL is demonstrated through simulations and analysis of real data with evident dataset shift.

READ FULL TEXT

page 4

page 24

page 25

research
07/11/2018

Quantification under prior probability shift: the ratio estimator and its extensions

The quantification problem consists of determining the prevalence of a g...
research
04/12/2021

Inference from Non-Random Samples Using Bayesian Machine Learning

We consider inference from non-random samples in data-rich settings wher...
research
10/08/2019

Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates

Learning with noisy labels is a common problem in supervised learning. E...
research
07/17/2021

Minimising quantifier variance under prior probability shift

For the binary prevalence quantification problem under prior probability...
research
06/14/2021

Inference with generalizable classifier predictions

This paper addresses the problem of making statistical inference about a...
research
11/03/2021

Shift Happens: Adjusting Classifiers

Minimizing expected loss measured by a proper scoring rule, such as Brie...
research
04/22/2020

Quantifying With Only Positive Training Data

Quantification is the research field that studies the task of counting h...

Please sign up or login with your details

Forgot password? Click here to reset