Multilingual Synthetic Question and Answer Generation for Cross-Lingual Reading Comprehension

by   Siamak Shakeri, et al.

We propose a simple method to generate large amounts of multilingual question and answer pairs by a single generative model. These synthetic samples are then applied to augment the available gold multilingual ones to improve the performance of multilingual QA models on target languages. Our approach only requires existence of automatically translated samples from English to the target domain, thus removing the need for human annotations in the target languages. Experimental results show our proposed approach achieves significant gains in a number of multilingual datasets.



page 1

page 2

page 3

page 4


Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension

Multilingual pre-trained models could leverage the training data from a ...

BiPaR: A Bilingual Parallel Dataset for Multilingual and Cross-lingual Reading Comprehension on Novels

This paper presents BiPaR, a bilingual parallel novel-style machine read...

Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation

Cross-lingual Machine Reading Comprehension (CLMRC) remains a challengin...

Learning Disentangled Semantic Representations for Zero-Shot Cross-Lingual Transfer in Multilingual Machine Reading Comprehension

Multilingual pre-trained models are able to zero-shot transfer knowledge...

Multilingual Question Answering from Formatted Text applied to Conversational Agents

Recent advances in NLP with language models such as BERT, GPT-2, XLNet o...

Multilingual Answer Sentence Reranking via Automatically Translated Data

We present a study on the design of multilingual Answer Sentence Selecti...

Improving Multilingual Models with Language-Clustered Vocabularies

State-of-the-art multilingual models depend on vocabularies that cover a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.