Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation

04/18/2021
by   Max Bartolo, et al.
0

Despite the availability of very large datasets and pretrained models, state-of-the-art question answering models remain susceptible to a variety of adversarial attacks and are still far from obtaining human-level language understanding. One proposed way forward is dynamic adversarial data collection, in which a human annotator attempts to create examples for which a model-in-the-loop fails. However, this approach comes at a higher cost per sample and slower pace of annotation, as model-adversarial data requires more annotator effort to generate. In this work, we investigate several answer selection, question generation, and filtering methods that form a synthetic adversarial data generation pipeline that takes human-generated adversarial samples and unannotated text to create synthetic question-answer pairs. Models trained on both synthetic and human-generated data outperform models not trained on synthetic adversarial data, and obtain state-of-the-art results on the AdversarialQA dataset with overall performance gains of 3.7F1. Furthermore, we find that training on the synthetic adversarial data improves model generalisation across domains for non-adversarial data, demonstrating gains on 9 of the 12 datasets for MRQA. Lastly, we find that our models become considerably more difficult to beat by human adversaries, with a drop in macro-averaged validated model error rate from 17.6 non-augmented models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2021

Models in the Loop: Aiding Crowdworkers with Generative Annotation Assistants

In Dynamic Adversarial Data Collection (DADC), human annotators are task...
research
06/02/2021

On the Efficacy of Adversarial Data Collection for Question Answering: Results from a Large-Scale Randomized Study

In adversarial data collection (ADC), a human workforce interacts with a...
research
10/23/2020

Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering

Coupled with the availability of large scale datasets, deep learning arc...
research
09/07/2018

Trick Me If You Can: Adversarial Writing of Trivia Challenge Questions

Modern natural language processing systems have been touted as approachi...
research
05/02/2020

DQI: Measuring Data Quality in NLP

Neural language models have achieved human level performance across seve...
research
11/16/2021

Adversarially Constructed Evaluation Sets Are More Challenging, but May Not Be Fair

More capable language models increasingly saturate existing task benchma...

Please sign up or login with your details

Forgot password? Click here to reset