TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

05/09/2017
by   Mandar Joshi, et al.
0

We present TriviaQA, a challenging reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions. We show that, in comparison to other recently introduced large-scale datasets, TriviaQA (1) has relatively complex, compositional questions, (2) has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and (3) requires more cross sentence reasoning to find answers. We also present two baseline algorithms: a feature-based classifier and a state-of-the-art neural network, that performs well on SQuAD reading comprehension. Neither approach comes close to human performance (23 TriviaQA is a challenging testbed that is worth significant future study. Data and code available at -- http://nlp.cs.washington.edu/triviaqa/

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/20/2021

Self-Supervised Test-Time Learning for Reading Comprehension

Recent work on unsupervised question answering has shown that models can...
research
11/13/2020

IIRC: A Dataset of Incomplete Information Reading Comprehension Questions

Humans often have to read multiple documents to address their informatio...
research
08/18/2021

EviDR: Evidence-Emphasized Discrete Reasoning for Reasoning Machine Reading Comprehension

Reasoning machine reading comprehension (R-MRC) aims to answer complex q...
research
06/16/2022

GAAMA 2.0: An Integrated System that Answers Boolean and Extractive Questions

Recent machine reading comprehension datasets include extractive and boo...
research
04/18/2021

Learning with Instance Bundles for Reading Comprehension

When training most modern reading comprehension models, all the question...
research
11/01/2022

CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation

The full power of human language-based communication cannot be realized ...
research
02/23/2019

Evidence Sentence Extraction for Machine Reading Comprehension

Recently remarkable success has been achieved in machine reading compreh...

Please sign up or login with your details

Forgot password? Click here to reset