EMBRACE: Evaluation and Modifications for Boosting RACE

05/15/2023
by   Mariia Zyrianova, et al.
0

When training and evaluating machine reading comprehension models, it is very important to work with high-quality datasets that are also representative of real-world reading comprehension tasks. This requirement includes, for instance, having questions that are based on texts of different genres and require generating inferences or reflecting on the reading material. In this article we turn our attention to RACE, a dataset of English texts and corresponding multiple-choice questions (MCQs). Each MCQ consists of a question and four alternatives (of which one is the correct answer). RACE was constructed by Chinese teachers of English for human reading comprehension and is widely used as training material for machine reading comprehension models. By construction, RACE should satisfy the aforementioned quality requirements and the purpose of this article is to check whether they are indeed satisfied. We provide a detailed analysis of the test set of RACE for high-school students (1045 texts and 3498 corresponding MCQs) including (1) an evaluation of the difficulty of each MCQ and (2) annotations for the relevant pieces of the texts (called "bases") that are used to justify the plausibility of each alternative. A considerable number of MCQs appear not to fulfill basic requirements for this type of reading comprehension tasks, so we additionally identify the high-quality subset of the evaluated RACE corpus. We also demonstrate that the distribution of the positions of the bases for the alternatives is biased towards certain parts of texts, which is not necessarily desirable when evaluating MCQ answering and generation models.

READ FULL TEXT

page 14

page 18

page 19

page 20

page 23

page 24

research
04/30/2020

STARC: Structured Annotations for Reading Comprehension

We present STARC (Structured Annotations for Reading Comprehension), a n...
research
04/15/2017

RACE: Large-scale ReAding Comprehension Dataset From Examinations

We present RACE, a new dataset for benchmark evaluation of methods in th...
research
07/15/2021

Automatic Task Requirements Writing Evaluation via Machine Reading Comprehension

Task requirements (TRs) writing is an important question type in Key Eng...
research
04/29/2020

Benchmarking Robustness of Machine Reading Comprehension Models

Machine Reading Comprehension (MRC) is an important testbed for evaluati...
research
09/27/2019

Multi-Modal Citizen Science: From Disambiguation to Transcription of Classical Literature

The engagement of citizens in the research projects, including Digital H...
research
06/28/2022

Collecting high-quality adversarial data for machine reading comprehension tasks with humans and models in the loop

We present our experience as annotators in the creation of high-quality,...
research
09/28/2019

Integrated Triaging for Fast Reading Comprehension

Although according to several benchmarks automatic machine reading compr...

Please sign up or login with your details

Forgot password? Click here to reset