On the Role of Parallel Data in Cross-lingual Transfer Learning

12/20/2022
by   Machel Reid, et al.
0

While prior work has established that the use of parallel data is conducive for cross-lingual learning, it is unclear if the improvements come from the data itself, or if it is the modeling of parallel interactions that matters. Exploring this, we examine the usage of unsupervised machine translation to generate synthetic parallel data, and compare it to supervised machine translation and gold parallel data. We find that even model generated parallel data can be useful for downstream tasks, in both a general setting (continued pretraining) as well as the task-specific setting (translate-train), although our best results are still obtained using real parallel data. Our findings suggest that existing multilingual models do not exploit the full potential of monolingual data, and prompt the community to reconsider the traditional categorization of cross-lingual learning approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2020

A Call for More Rigor in Unsupervised Cross-lingual Learning

We review motivations, definition, approaches, and methodology for unsup...
research
11/30/2022

Domain Mismatch Doesn't Always Prevent Cross-Lingual Transfer Learning

Cross-lingual transfer learning without labeled target language data or ...
research
04/09/2020

Translation Artifacts in Cross-lingual Transfer Learning

Both human and machine translation play a central role in cross-lingual ...
research
08/04/2021

PARADISE: Exploiting Parallel Data for Multilingual Sequence-to-Sequence Pretraining

Despite the success of multilingual sequence-to-sequence pretraining, mo...
research
05/28/2023

Parallel Data Helps Neural Entity Coreference Resolution

Coreference resolution is the task of finding expressions that refer to ...
research
01/17/2023

Prompting Large Language Model for Machine Translation: A Case Study

Research on prompting has shown excellent performance with little or eve...
research
06/10/2021

Exploring Unsupervised Pretraining Objectives for Machine Translation

Unsupervised cross-lingual pretraining has achieved strong results in ne...

Please sign up or login with your details

Forgot password? Click here to reset