Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases

09/09/2019
by   Christopher Clark, et al.
10

State-of-the-art models often make use of superficial patterns in the data that do not generalize well to out-of-domain or adversarial settings. For example, textual entailment models often learn that particular key words imply entailment, irrespective of context, and visual question answering models learn to predict prototypical answers, without considering evidence in the image. In this paper, we show that if we have prior knowledge of such biases, we can train a model to be more robust to domain shift. Our method has two stages: we (1) train a naive model that makes predictions exclusively based on dataset biases, and (2) train a robust model as part of an ensemble with the naive one in order to encourage it to focus on other patterns in the data that are more likely to generalize. Experiments on five datasets with out-of-domain test sets show significantly improved robustness in all settings, including a 12 point gain on a changing priors visual question answering dataset and a 9 point gain on an adversarial question answering test set.

READ FULL TEXT
research
11/07/2020

Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles

Many datasets have been shown to contain incidental correlations created...
research
04/30/2020

Robust Question Answering Through Sub-part Alignment

Current textual question answering models achieve strong performance on ...
research
12/01/2022

Task Discovery: Finding the Tasks that Neural Networks Generalize on

When developing deep learning models, we usually decide what task we wan...
research
11/10/2020

Medical Knowledge-enriched Textual Entailment Framework

One of the cardinal tasks in achieving robust medical question answering...
research
10/07/2020

Improving QA Generalization by Concurrent Modeling of Multiple Biases

Existing NLP datasets contain various biases that models can easily expl...
research
06/23/2019

Investigating Biases in Textual Entailment Datasets

The ability to understand logical relationships between sentences is an ...
research
04/29/2020

The Effect of Natural Distribution Shift on Question Answering Models

We build four new test sets for the Stanford Question Answering Dataset ...

Please sign up or login with your details

Forgot password? Click here to reset