Multilingual Transfer Learning for QA Using Translation as Data Augmentation

12/10/2020
by   Mihaela Bornea, et al.
0

Prior work on multilingual question answering has mostly focused on using large multilingual pre-trained language models (LM) to perform zero-shot language-wise learning: train a QA model on English and test on other languages. In this work, we explore strategies that improve cross-lingual transfer by bringing the multilingual embeddings closer in the semantic space. Our first strategy augments the original English training data with machine translation-generated data. This results in a corpus of multilingual silver-labeled QA pairs that is 14 times larger than the original training set. In addition, we propose two novel strategies, language adversarial training and language arbitration framework, which significantly improve the (zero-resource) cross-lingual transfer performance and result in LM embeddings that are less language-variant. Empirically, we show that the proposed models outperform the previous zero-shot baseline on the recently introduced multilingual MLQA and TyDiQA datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/17/2021

Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training

In recent years, pre-trained multilingual language models, such as multi...
research
06/30/2021

Revisiting the Primacy of English in Zero-shot Cross-lingual Transfer

Despite their success, large pre-trained multilingual models have not co...
research
06/07/2022

OCHADAI at SemEval-2022 Task 2: Adversarial Training for Multilingual Idiomaticity Detection

We propose a multilingual adversarial training model for determining whe...
research
10/21/2022

On the Calibration of Massively Multilingual Language Models

Massively Multilingual Language Models (MMLMs) have recently gained popu...
research
08/18/2022

MulZDG: Multilingual Code-Switching Framework for Zero-shot Dialogue Generation

Building dialogue generation systems in a zero-shot scenario remains a h...
research
07/31/2017

SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation

Semantic Textual Similarity (STS) measures the meaning similarity of sen...
research
11/15/2022

QAmeleon: Multilingual QA with Only 5 Examples

The availability of large, high-quality datasets has been one of the mai...

Please sign up or login with your details

Forgot password? Click here to reset