Biquality Learning: a Framework to Design Algorithms Dealing with Closed-Set Distribution Shifts

08/29/2023
by   Pierre Nodet, et al.
0

Training machine learning models from data with weak supervision and dataset shifts is still challenging. Designing algorithms when these two situations arise has not been explored much, and existing algorithms cannot always handle the most complex distributional shifts. We think the biquality data setup is a suitable framework for designing such algorithms. Biquality Learning assumes that two datasets are available at training time: a trusted dataset sampled from the distribution of interest and the untrusted dataset with dataset shifts and weaknesses of supervision (aka distribution shifts). The trusted and untrusted datasets available at training time make designing algorithms dealing with any distribution shifts possible. We propose two methods, one inspired by the label noise literature and another by the covariate shift literature for biquality learning. We experiment with two novel methods to synthetically introduce concept drift and class-conditional shifts in real-world datasets across many of them. We opened some discussions and assessed that developing biquality learning algorithms robust to distributional changes remains an interesting problem for future research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2021

SHIFT15M: Multiobjective Large-Scale Fashion Dataset with Distributional Shifts

Many machine learning algorithms assume that the training data and the t...
research
08/18/2023

biquality-learn: a Python library for Biquality Learning

The democratization of Data Mining has been widely successful thanks in ...
research
10/04/2022

Data drift correction via time-varying importance weight estimator

Real-world deployment of machine learning models is challenging when dat...
research
11/24/2022

Beyond Mahalanobis-Based Scores for Textual OOD Detection

Deep learning methods have boosted the adoption of NLP systems in real-l...
research
11/17/2022

DSLOB: A Synthetic Limit Order Book Dataset for Benchmarking Forecasting Algorithms under Distributional Shift

In electronic trading markets, limit order books (LOBs) provide informat...
research
09/19/2023

A Configurable Library for Generating and Manipulating Maze Datasets

Understanding how machine learning models respond to distributional shif...
research
10/19/2022

"Why did the Model Fail?": Attributing Model Performance Changes to Distribution Shifts

Performance of machine learning models may differ between training and d...

Please sign up or login with your details

Forgot password? Click here to reset