A Framework for Evaluation of Machine Reading Comprehension Gold Standards

03/10/2020
by   Viktor Schlegel, et al.
0

Machine Reading Comprehension (MRC) is the task of answering a question over a paragraph of text. While neural MRC systems gain popularity and achieve noticeable performance, issues are being raised with the methodology used to establish their performance, particularly concerning the data design of gold standards that are used to evaluate them. There is but a limited understanding of the challenges present in this data, which makes it hard to draw comparisons and formulate reliable hypotheses. As a first step towards alleviating the problem, this paper proposes a unifying framework to systematically investigate the present linguistic features, required reasoning and background knowledge and factual correctness on one hand, and the presence of lexical cues as a lower bound for the requirement of understanding on the other hand. We propose a qualitative annotation schema for the first and a set of approximative metrics for the latter. In a first application of the framework, we analyse modern MRC gold standards and present our findings: the absence of features that contribute towards lexical ambiguity, the varying factual correctness of the expected answers and the presence of lexical cues, all of which potentially lower the reading comprehension complexity and quality of the evaluation data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2020

STARC: Structured Annotations for Reading Comprehension

We present STARC (Structured Annotations for Reading Comprehension), a n...
research
06/10/2018

Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading Comprehension Task

Current evaluation metrics to question answering based machine reading c...
research
10/07/2020

MOCHA: A Dataset for Training and Evaluating Generative Reading Comprehension Metrics

Posing reading comprehension as a generation problem provides a great de...
research
05/31/2021

How Lexical Gold Standards Have Effects On The Usefulness Of Text Analysis Tools For Digital Scholarship

This paper describes how the current lexical similarity and analogy gold...
research
09/15/2022

Machine Reading, Fast and Slow: When Do Models "Understand" Language?

Two of the most fundamental challenges in Natural Language Understanding...
research
12/07/2020

Semantics Altering Modifications for Evaluating Comprehension in Machine Reading

Advances in NLP have yielded impressive results for the task of machine ...

Please sign up or login with your details

Forgot password? Click here to reset