Log In Sign Up

Measuring and Reducing Gendered Correlations in Pre-trained Models

by   Kellie Webster, et al.

Pre-trained models have revolutionized natural language understanding. However, researchers have found they can encode artifacts undesired in many applications, such as professions correlating with one gender more than another. We explore such gendered correlations as a case study for how to address unintended correlations in pre-trained models. We define metrics and reveal that it is possible for models with similar accuracy to encode correlations at very different rates. We show how measured correlations can be reduced with general-purpose techniques, and highlight the trade offs different strategies have. With these results, we make recommendations for training robust models: (1) carefully evaluate unintended correlations, (2) be mindful of seemingly innocuous configuration differences, and (3) focus on general mitigations.


page 1

page 2

page 3

page 4


An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models

Recent work has shown that pre-trained language models such as BERT impr...

A Survey of Knowledge Enhanced Pre-trained Models

Pre-trained models learn contextualized word representations on large-sc...

Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark

The deep learning community has proposed optimizations spanning hardware...

How robust are pre-trained models to distribution shift?

The vulnerability of machine learning models to spurious correlations ha...

Adversarial Examples Generation for Reducing Implicit Gender Bias in Pre-trained Models

Over the last few years, Contextualized Pre-trained Neural Language Mode...

Can Deep Neural Networks Predict Data Correlations from Column Names?

For humans, it is often possible to predict data correlations from colum...

What Do Adversarially Robust Models Look At?

In this paper, we address the open question: "What do adversarially robu...