Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation

10/27/2020
by   Junhao Liu, et al.
0

Cross-lingual Machine Reading Comprehension (CLMRC) remains a challenging problem due to the lack of large-scale annotated datasets in low-source languages, such as Arabic, Hindi, and Vietnamese. Many previous approaches use translation data by translating from a rich-source language, such as English, to low-source languages as auxiliary supervision. However, how to effectively leverage translation data and reduce the impact of noise introduced by translation remains onerous. In this paper, we tackle this challenge and enhance the cross-lingual transferring performance by a novel augmentation approach named Language Branch Machine Reading Comprehension (LBMRC). A language branch is a group of passages in one single language paired with questions in all target languages. We train multiple machine reading comprehension (MRC) models proficient in individual language based on LBMRC. Then, we devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages. Combining the LBMRC and multilingual distillation can be more robust to the data noises, therefore, improving the model's cross-lingual ability. Meanwhile, the produced single multilingual model is applicable to all target languages, which saves the cost of training, inference, and maintenance for multiple models. Extensive experiments on two CLMRC benchmarks clearly show the effectiveness of our proposed method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2020

Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension

Multilingual pre-trained models could leverage the training data from a ...
research
05/08/2021

Improving Cross-Lingual Reading Comprehension with Self-Training

Substantial improvements have been made in machine reading comprehension...
research
02/26/2023

Cross-Lingual Question Answering over Knowledge Base as Reading Comprehension

Although many large-scale knowledge bases (KBs) claim to contain multili...
research
10/22/2020

Multilingual Synthetic Question and Answer Generation for Cross-Lingual Reading Comprehension

We propose a simple method to generate large amounts of multilingual que...
research
04/13/2020

Adversarial Augmentation Policy Search for Domain and Cross-Lingual Generalization in Reading Comprehension

Reading comprehension models often overfit to nuances of training datase...
research
08/15/2019

XCMRC: Evaluating Cross-lingual Machine Reading Comprehension

We present XCMRC, the first public cross-lingual language understanding ...
research
08/23/2018

Attention-Guided Answer Distillation for Machine Reading Comprehension

Despite that current reading comprehension systems have achieved signifi...

Please sign up or login with your details

Forgot password? Click here to reset