Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation

by   Max Bartolo, et al.

Despite the availability of very large datasets and pretrained models, state-of-the-art question answering models remain susceptible to a variety of adversarial attacks and are still far from obtaining human-level language understanding. One proposed way forward is dynamic adversarial data collection, in which a human annotator attempts to create examples for which a model-in-the-loop fails. However, this approach comes at a higher cost per sample and slower pace of annotation, as model-adversarial data requires more annotator effort to generate. In this work, we investigate several answer selection, question generation, and filtering methods that form a synthetic adversarial data generation pipeline that takes human-generated adversarial samples and unannotated text to create synthetic question-answer pairs. Models trained on both synthetic and human-generated data outperform models not trained on synthetic adversarial data, and obtain state-of-the-art results on the AdversarialQA dataset with overall performance gains of 3.7F1. Furthermore, we find that training on the synthetic adversarial data improves model generalisation across domains for non-adversarial data, demonstrating gains on 9 of the 12 datasets for MRQA. Lastly, we find that our models become considerably more difficult to beat by human adversaries, with a drop in macro-averaged validated model error rate from 17.6 non-augmented models.


page 1

page 2

page 3

page 4


Models in the Loop: Aiding Crowdworkers with Generative Annotation Assistants

In Dynamic Adversarial Data Collection (DADC), human annotators are task...

On the Efficacy of Adversarial Data Collection for Question Answering: Results from a Large-Scale Randomized Study

In adversarial data collection (ADC), a human workforce interacts with a...

Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering

Coupled with the availability of large scale datasets, deep learning arc...

DQI: Measuring Data Quality in NLP

Neural language models have achieved human level performance across seve...

Synthetic QA Corpora Generation with Roundtrip Consistency

We introduce a novel method of generating synthetic question answering c...

Sanitizing Synthetic Training Data Generation for Question Answering over Knowledge Graphs

Synthetic data generation is important to training and evaluating neural...