Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading Comprehension Task

06/10/2018
by   An Yang, et al.
0

Current evaluation metrics to question answering based machine reading comprehension (MRC) systems generally focus on the lexical overlap between the candidate and reference answers, such as ROUGE and BLEU. However, bias may appear when these metrics are used for specific question types, especially questions inquiring yes-no opinions and entity lists. In this paper, we make adaptations on the metrics to better correlate n-gram overlap with the human judgment for answers to these two question types. Statistical analysis proves the effectiveness of our approach. Our adaptations may provide positive guidance for the development of real-scene MRC systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2020

MOCHA: A Dataset for Training and Evaluating Generative Reading Comprehension Metrics

Posing reading comprehension as a generation problem provides a great de...
research
11/28/2016

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

This paper presents our recent work on the design and development of a n...
research
03/10/2020

A Framework for Evaluation of Machine Reading Comprehension Gold Standards

Machine Reading Comprehension (MRC) is the task of answering a question ...
research
05/03/2022

Quiz Design Task: Helping Teachers Create Quizzes with Automated Question Generation

Question generation (QGen) models are often evaluated with standardized ...
research
04/15/2020

Exploring Probabilistic Soft Logic as a framework for integrating top-down and bottom-up processing of language in a task context

This technical report describes a new prototype architecture designed to...
research
12/08/2022

A Comprehensive Survey on Multi-hop Machine Reading Comprehension Datasets and Metrics

Multi-hop Machine reading comprehension is a challenging task with aim o...
research
05/25/2021

NEUer at SemEval-2021 Task 4: Complete Summary Representation by Filling Answers into Question for Matching Reading Comprehension

SemEval task 4 aims to find a proper option from multiple candidates to ...

Please sign up or login with your details

Forgot password? Click here to reset