Sampling Bias Correction for Supervised Machine Learning: A Bayesian Inference Approach with Practical Applications

03/11/2022
by   Max Sklar, et al.
0

Given a supervised machine learning problem where the training set has been subject to a known sampling bias, how can a model be trained to fit the original dataset? We achieve this through the Bayesian inference framework by altering the posterior distribution to account for the sampling function. We then apply this solution to binary logistic regression, and discuss scenarios where a dataset might be subject to intentional sample bias such as label imbalance. This technique is widely applicable for statistical inference on big data, from the medical sciences to image recognition to marketing. Familiarity with it will give the practitioner tools to improve their inference pipeline from data collection to model selection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2021

Scalable Econometrics on Big Data – The Logistic Regression on Spark

Extra-large datasets are becoming increasingly accessible, and computing...
research
07/03/2023

Systematic Bias in Sample Inference and its Effect on Machine Learning

A commonly observed pattern in machine learning models is an underpredic...
research
06/24/2020

Bayesian Sampling Bias Correction: Training with the Right Loss Function

We derive a family of loss functions to train models in the presence of ...
research
01/27/2021

Detecting discriminatory risk through data annotation based on Bayesian inferences

Thanks to the increasing growth of computational power and data availabi...
research
01/10/2013

Recognition Networks for Approximate Inference in BN20 Networks

We propose using recognition networks for approximate inference inBayesi...
research
01/19/2021

Robust Bayesian Inference for Big Data: Combining Sensor-based Records with Traditional Survey Data

Big Data often presents as massive non-probability samples. Not only is ...
research
10/31/2017

Calibration for Stratified Classification Models

In classification problems, sampling bias between training data and test...

Please sign up or login with your details

Forgot password? Click here to reset