Cross-Lingual Machine Reading Comprehension

09/01/2019
by   Yiming Cui, et al.
0

Though the community has made great progress on Machine Reading Comprehension (MRC) task, most of the previous works are solving English-based MRC problems, and there are few efforts on other languages mainly due to the lack of large-scale training data. In this paper, we propose Cross-Lingual Machine Reading Comprehension (CLMRC) task for the languages other than English. Firstly, we present several back-translation approaches for CLMRC task, which is straightforward to adopt. However, to accurately align the answer into another language is difficult and could introduce additional noise. In this context, we propose a novel model called Dual BERT, which takes advantage of the large-scale training data provided by rich-resource language (such as English) and learn the semantic relations between the passage and question in a bilingual context, and then utilize the learned knowledge to improve reading comprehension performance of low-resource language. We conduct experiments on two Chinese machine reading comprehension datasets CMRC 2018 and DRCD. The results show consistent and significant improvements over various state-of-the-art systems by a large margin, which demonstrate the potentials in CLMRC task. Resources available: https://github.com/ymcui/Cross-Lingual-MRC

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2019

XCMRC: Evaluating Cross-lingual Machine Reading Comprehension

We present XCMRC, the first public cross-lingual language understanding ...
research
05/08/2021

Improving Cross-Lingual Reading Comprehension with Self-Training

Substantial improvements have been made in machine reading comprehension...
research
07/11/2021

Improving Low-resource Reading Comprehension via Cross-lingual Transposition Rethinking

Extractive Reading Comprehension (ERC) has made tremendous advances enab...
research
07/03/2020

Reading Comprehension in Czech via Machine Translation and Cross-lingual Transfer

Reading comprehension is a well studied task, with huge training dataset...
research
04/13/2020

Adversarial Augmentation Policy Search for Domain and Cross-Lingual Generalization in Reading Comprehension

Reading comprehension models often overfit to nuances of training datase...
research
12/20/2019

SberQuAD – Russian Reading Comprehension Dataset: Description and Analysis

SberQuAD—a large scale analog of Stanford SQuAD in the Russian language—...
research
06/06/2016

Generating and Exploiting Large-scale Pseudo Training Data for Zero Pronoun Resolution

Most existing approaches for zero pronoun resolution are heavily relying...

Please sign up or login with your details

Forgot password? Click here to reset