Embracing data abundance: BookTest Dataset for Reading Comprehension

10/04/2016
by   Ondrej Bajgar, et al.
0

There is a practically unlimited amount of natural language data available. Still, recent work in text comprehension has focused on datasets which are small relative to current computing possibilities. This article is making a case for the community to move to larger data and as a step in that direction it is proposing the BookTest, a new dataset similar to the popular Children's Book Test (CBT), however more than 60 times larger. We show that training on the new data improves the accuracy of our Attention-Sum Reader model on the original CBT test data by a much larger margin than many recent attempts to improve the model architecture. On one version of the dataset our ensemble even exceeds the human baseline provided by Facebook. We then show in our own human study that there is still space for further improvement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2023

A Multiple Choices Reading Comprehension Corpus for Vietnamese Language Education

Machine reading comprehension has been an interesting and challenging ta...
research
07/15/2016

Attention-over-Attention Neural Networks for Reading Comprehension

Cloze-style queries are representative problems in reading comprehension...
research
11/28/2016

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

This paper presents our recent work on the design and development of a n...
research
06/30/2019

Machine Reading Comprehension: a Literature Review

Machine reading comprehension aims to teach machines to understand a tex...
research
06/07/2016

Natural Language Comprehension with the EpiReader

We present the EpiReader, a novel model for machine comprehension of tex...
research
05/15/2019

Learning Open Information Extraction of Implicit Relations from Reading Comprehension Datasets

The relationship between two entities in a sentence is often implied by ...

Please sign up or login with your details

Forgot password? Click here to reset