Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question Answering Data

02/01/2021
by   Dian Yu, et al.
0

In spite of much recent research in the area, it is still unclear whether subject-area question-answering data is useful for machine reading comprehension (MRC) tasks. In this paper, we investigate this question. We collect a large-scale multi-subject multiple-choice question-answering dataset, ExamQA, and use incomplete and noisy snippets returned by a web search engine as the relevant context for each question-answering instance to convert it into a weakly-labeled MRC instance. We then propose a self-teaching paradigm to better use the generated weakly-labeled MRC instances to improve a target MRC task. Experimental results show that we can obtain an improvement of 5.1 accuracy on a multiple-choice MRC dataset, C^3, demonstrating the effectiveness of our framework and the usefulness of large-scale subject-area question-answering data for machine reading comprehension.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2020

Bridging Information-Seeking Human Gaze and Machine Reading Comprehension

In this work, we analyze how human gaze during reading comprehension is ...
research
02/28/2018

Medical Exam Question Answering with Large-scale Reading Comprehension

Reading and understanding text is one important component in computer ai...
research
09/25/2019

Question Answering is a Format; When is it Useful?

Recent years have seen a dramatic expansion of tasks and datasets posed ...
research
06/23/2021

PALRACE: Reading Comprehension Dataset with Human Data and Labeled Rationales

Pre-trained language models achieves high performance on machine reading...
research
03/27/2022

MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering

This paper introduces MedMCQA, a new large-scale, Multiple-Choice Questi...
research
08/19/2016

Who did What: A Large-Scale Person-Centered Cloze Dataset

We have constructed a new "Who-did-What" dataset of over 200,000 fill-in...
research
09/19/2023

Benchmarks for Pirá 2.0, a Reading Comprehension Dataset about the Ocean, the Brazilian Coast, and Climate Change

Pirá is a reading comprehension dataset focused on the ocean, the Brazil...

Please sign up or login with your details

Forgot password? Click here to reset