Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

by   Aditya Gupta, et al.

Disfluencies is an under-studied topic in NLP, even though it is ubiquitous in human conversation. This is largely due to the lack of datasets containing disfluencies. In this paper, we present a new challenge question answering dataset, Disfl-QA, a derivative of SQuAD, where humans introduce contextual disfluencies in previously fluent questions. Disfl-QA contains a variety of challenging disfluencies that require a more comprehensive understanding of the text than what was necessary in prior datasets. Experiments show that the performance of existing state-of-the-art question answering models degrades significantly when tested on Disfl-QA in a zero-shot setting.We show data augmentation methods partially recover the loss in performance and also demonstrate the efficacy of using gold data for fine-tuning. We argue that we need large-scale disfluency datasets in order for NLP models to be robust to them. The dataset is publicly available at: https://github.com/google-research-datasets/disfl-qa.


page 1

page 2

page 3

page 4


JaQuAD: Japanese Question Answering Dataset for Machine Reading Comprehension

Question Answering (QA) is a task in which a machine understands a given...

Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering

Recently, a simple combination of passage retrieval using off-the-shelf ...

Can Question Generation Debias Question Answering Models? A Case Study on Question-Context Lexical Overlap

Question answering (QA) models for reading comprehension have been demon...

ManyModalQA: Modality Disambiguation and QA over Diverse Inputs

We present a new multimodal question answering challenge, ManyModalQA, i...

SD-QA: Spoken Dialectal Question Answering for the Real World

Question answering (QA) systems are now available through numerous comme...

Frustratingly Easy Natural Question Answering

Existing literature on Question Answering (QA) mostly focuses on algorit...

Question Answering Infused Pre-training of General-Purpose Contextualized Representations

This paper proposes a pre-training objective based on question answering...

Code Repositories


A Benchmark Dataset for Understanding Disfluencies in Question Answering

view repo