Probing Prior Knowledge Needed in Challenging Chinese Machine Reading Comprehension

04/21/2019
by   Kai Sun, et al.
0

With an ultimate goal of narrowing the gap between human and machine readers in text comprehension, we present the first collection of Challenging Chinese machine reading Comprehension datasets (C^3) collected from language and professional certification exams, which contains 13,924 documents and their associated 23,990 multiple-choice questions. Most of the questions in C^3 cannot be answered merely by surface-form matching against the given text. As a pilot study, we closely analyze the prior knowledge (i.e., linguistic, domain-specific, and general world knowledge) needed in these real world reading comprehension tasks. We further explore how to leverage linguistic knowledge including a lexicon of common idioms and proverbs and domain-specific knowledge such as textbooks to aid machine readers, through fine-tuning a pre-trained language model (Devlin et al.,2019). Our experimental results demonstrate that linguistic knowledge may help improve the performance of the baseline reader in both general and domain-specific tasks. C^3 will be available at http://dataset.org/c3/.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2018

DRCD: a Chinese Machine Reading Comprehension Dataset

In this paper, we introduce DRCD (Delta Reading Comprehension Dataset), ...
research
10/31/2018

Improving Machine Reading Comprehension with General Reading Strategies

Reading strategies have been shown to improve comprehension levels, espe...
research
09/18/2023

Adapting Large Language Models via Reading Comprehension

We explore how continued pre-training on domain-specific corpora influen...
research
10/07/2022

SpaceQA: Answering Questions about the Design of Space Missions and Space Craft Concepts

We present SpaceQA, to the best of our knowledge the first open-domain Q...
research
04/24/2020

Contextualized Representations Using Textual Encyclopedic Knowledge

We present a method to represent input texts by contextualizing them joi...
research
12/13/2021

Native Chinese Reader: A Dataset Towards Native-Level Chinese Machine Reading Comprehension

We present Native Chinese Reader (NCR), a new machine reading comprehens...
research
08/14/2019

SG-Net: Syntax-Guided Machine Reading Comprehension

For machine reading comprehension, how to effectively model the linguist...

Please sign up or login with your details

Forgot password? Click here to reset