ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine Reading Comprehension

12/29/2019
by   Dheeru Dua, et al.
0

Reading comprehension is one of the crucial tasks for furthering research in natural language understanding. A lot of diverse reading comprehension datasets have recently been introduced to study various phenomena in natural language, ranging from simple paraphrase matching and entity typing to entity tracking and understanding the implications of the context. Given the availability of many such datasets, comprehensive and reliable evaluation is tedious and time-consuming for researchers working on this problem. We present an evaluation server, ORB, that reports performance on seven diverse reading comprehension datasets, encouraging and facilitating testing a single model's capability in understanding a wide variety of reading phenomena. The evaluation server places no restrictions on how models are trained, so it is a suitable test bed for exploring training paradigms and representation learning for general reading facility. As more suitable datasets are released, they will be added to the evaluation server. We also collect and include synthetic augmentations for these datasets, testing how well models can handle out-of-domain questions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/25/2017

Dataset for the First Evaluation on Chinese Machine Reading Comprehension

Machine Reading Comprehension (MRC) has become enormously popular recent...
research
03/29/2019

Making Neural Machine Reading Comprehension Faster

This study aims at solving the Machine Reading Comprehension problem whe...
research
10/05/2018

Entity Tracking Improves Cloze-style Reading Comprehension

Reading comprehension tasks test the ability of models to process long-t...
research
09/28/2019

Integrated Triaging for Fast Reading Comprehension

Although according to several benchmarks automatic machine reading compr...
research
05/22/2023

Cross-functional Analysis of Generalisation in Behavioural Learning

In behavioural testing, system functionalities underrepresented in the s...
research
10/05/2020

Interactive Fiction Game Playing as Multi-Paragraph Reading Comprehension with Reinforcement Learning

Interactive Fiction (IF) games with real human-written natural language ...
research
03/12/2022

What Makes Reading Comprehension Questions Difficult?

For a natural language understanding benchmark to be useful in research,...

Please sign up or login with your details

Forgot password? Click here to reset