Generative Data Augmentation using LLMs improves Distributional Robustness in Question Answering

09/03/2023
by   Arijit Ghosh Chowdhury, et al.
0

Robustness in Natural Language Processing continues to be a pertinent issue, where state of the art models under-perform under naturally shifted distributions. In the context of Question Answering, work on domain adaptation methods continues to be a growing body of research. However, very little attention has been given to the notion of domain generalization under natural distribution shifts, where the target domain is unknown. With drastic improvements in the quality and access to generative models, we answer the question: How do generated datasets influence the performance of QA models under natural distribution shifts? We perform experiments on 4 different datasets under varying amounts of distribution shift, and analyze how "in-the-wild" generation can help achieve domain generalization. We take a two-step generation approach, generating both contexts and QA pairs to augment existing datasets. Through our experiments, we demonstrate how augmenting reading comprehension datasets with generated data leads to better robustness towards natural distribution shifts.

READ FULL TEXT
research
02/09/2023

Robust Question Answering against Distribution Shifts with Test-Time Adaptation: An Empirical Study

A deployed question answering (QA) model can easily fail when the test d...
research
02/05/2023

Leaving Reality to Imagination: Robust Classification via Generated Datasets

Recent research on robustness has revealed significant performance gaps ...
research
04/29/2020

The Effect of Natural Distribution Shift on Question Answering Models

We build four new test sets for the Stanford Question Answering Dataset ...
research
09/26/2022

On the Impact of Speech Recognition Errors in Passage Retrieval for Spoken Question Answering

Interacting with a speech interface to query a Question Answering (QA) s...
research
03/15/2022

Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness

Data modification, either via additional training datasets, data augment...
research
07/01/2021

Ensemble Learning-Based Approach for Improving Generalization Capability of Machine Reading Comprehension Systems

Machine Reading Comprehension (MRC) is an active field in natural langua...
research
05/14/2023

Learning to Generalize for Cross-domain QA

There have been growing concerns regarding the out-of-domain generalizat...

Please sign up or login with your details

Forgot password? Click here to reset