Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering

10/23/2020
by   Arij Riabi, et al.
0

Coupled with the availability of large scale datasets, deep learning architectures have enabled rapid progress on the Question Answering task. However, most of those datasets are in English, and the performances of state-of-the-art multilingual models are significantly lower when evaluated on non-English data. Due to high data collection costs, it is not realistic to obtain annotated data for each language one desires to support. We propose a method to improve the Cross-lingual Question Answering performance without requiring additional annotated data, leveraging Question Generation models to produce synthetic samples in a cross-lingual fashion. We show that the proposed method allows to significantly outperform the baselines trained on English data only. We report a new state-of-the-art on four multilingual datasets: MLQA, XQuAD, SQuAD-it and PIAF (fr).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2021

xGQA: Cross-Lingual Visual Question Answering

Recent advances in multimodal vision and language modeling have predomin...
research
04/24/2023

PAXQA: Generating Cross-lingual Question Answering Examples at Training Scale

Existing question answering (QA) systems owe much of their success to la...
research
05/16/2023

xPQA: Cross-Lingual Product Question Answering across 12 Languages

Product Question Answering (PQA) systems are key in e-commerce applicati...
research
09/17/2020

A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching

Recognizing toponyms and resolving them to their real-world referents is...
research
04/18/2021

Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation

Despite the availability of very large datasets and pretrained models, s...
research
06/02/2020

BERT Based Multilingual Machine Comprehension in English and Hindi

Multilingual Machine Comprehension (MMC) is a Question-Answering (QA) su...
research
07/13/2023

MegaWika: Millions of reports and their sources across 50 diverse languages

To foster the development of new models for collaborative AI-assisted re...

Please sign up or login with your details

Forgot password? Click here to reset