IDK-MRC: Unanswerable Questions for Indonesian Machine Reading Comprehension

10/25/2022
by   Rifki Afina Putri, et al.
6

Machine Reading Comprehension (MRC) has become one of the essential tasks in Natural Language Understanding (NLU) as it is often included in several NLU benchmarks (Liang et al., 2020; Wilie et al., 2020). However, most MRC datasets only have answerable question type, overlooking the importance of unanswerable questions. MRC models trained only on answerable questions will select the span that is most likely to be the answer, even when the answer does not actually exist in the given passage (Rajpurkar et al., 2018). This problem especially remains in medium- to low-resource languages like Indonesian. Existing Indonesian MRC datasets (Purwarianti et al., 2007; Clark et al., 2020) are still inadequate because of the small size and limited question types, i.e., they only cover answerable questions. To fill this gap, we build a new Indonesian MRC dataset called I(n)don'tKnow- MRC (IDK-MRC) by combining the automatic and manual unanswerable question generation to minimize the cost of manual dataset construction while maintaining the dataset quality. Combined with the existing answerable questions, IDK-MRC consists of more than 10K questions in total. Our analysis shows that our dataset significantly improves the performance of Indonesian MRC models, showing a large improvement for unanswerable questions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/03/2020

GenNet : Reading Comprehension with Multiple Choice Questions using Generation and Selection model

Multiple-choice machine reading comprehension is difficult task as its r...
research
04/27/2018

Weaver: Deep Co-Encoding of Questions and Documents for Machine Reading

This paper aims at improving how machines can answer questions directly ...
research
11/04/2016

Learning Recurrent Span Representations for Extractive Question Answering

The reading comprehension task, that asks questions about a given eviden...
research
10/09/2020

Counterfactually-Augmented SNLI Training Data Does Not Yield Better Generalization Than Unaugmented Data

A growing body of work shows that models exploit annotation artifacts to...
research
04/15/2017

Neural Paraphrase Identification of Questions with Noisy Pretraining

We present a solution to the problem of paraphrase identification of que...
research
06/09/2016

A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task

Enabling a computer to understand a document so that it can answer compr...
research
04/15/2018

What Happened? Leveraging VerbNet to Predict the Effects of Actions in Procedural Text

Our goal is to answer questions about paragraphs describing processes (e...

Please sign up or login with your details

Forgot password? Click here to reset