Data Feedback Loops: Model-driven Amplification of Dataset Biases

09/08/2022
by   Rohan Taori, et al.
5

Datasets scraped from the internet have been critical to the successes of large-scale machine learning. Yet, this very success puts the utility of future internet-derived datasets at potential risk, as model outputs begin to replace human annotations as a source of supervision. In this work, we first formalize a system where interactions with one model are recorded as history and scraped as training data in the future. We then analyze its stability over time by tracking changes to a test-time bias statistic (e.g. gender bias of model predictions). We find that the degree of bias amplification is closely linked to whether the model's outputs behave like samples from the training distribution, a behavior which we characterize and define as consistent calibration. Experiments in three conditional prediction scenarios - image classification, visual role-labeling, and language generation - demonstrate that models that exhibit a sampling-like behavior are more calibrated and thus more stable. Based on this insight, we propose an intervention to help calibrate and stabilize unstable feedback systems. Code is available at https://github.com/rtaori/data_feedback.

READ FULL TEXT

page 8

page 11

page 22

page 23

page 27

page 28

page 29

page 30

research
06/21/2023

VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution

We introduce VisoGender, a novel dataset for benchmarking gender bias in...
research
04/16/2020

ViBE: A Tool for Measuring and Mitigating Bias in Image Datasets

Machine learning models are known to perpetuate the biases present in th...
research
09/20/2019

Sampling Bias in Deep Active Classification: An Empirical Study

The exploding cost and time needed for data labeling and model training ...
research
08/19/2023

Inductive-bias Learning: Generating Code Models with Large Language Model

Large Language Models(LLMs) have been attracting attention due to a abil...
research
07/29/2017

Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints

Language is increasingly being used to define rich visual recognition pr...
research
04/03/2023

Discovering and Explaining the Non-Causality of Deep Learning in SAR ATR

In recent years, deep learning has been widely used in SAR ATR and achie...
research
05/10/2021

AutoDebias: Learning to Debias for Recommendation

Recommender systems rely on user behavior data like ratings and clicks t...

Please sign up or login with your details

Forgot password? Click here to reset