When Does Monolingual Data Help Multilingual Translation: The Role of Domain and Model Scale

by   Christos Baziotis, et al.

Multilingual machine translation (MMT), trained on a mixture of parallel and monolingual data, is key for improving translation in low-resource language pairs. However, the literature offers conflicting results on the performance of different methods. To resolve this, we examine how denoising autoencoding (DAE) and backtranslation (BT) impact MMT under different data conditions and model scales. Unlike prior studies, we use a realistic dataset of 100 directions and consider many domain combinations of monolingual and test data. We find that monolingual data generally helps MMT, but models are surprisingly brittle to domain mismatches, especially at smaller model scales. BT is beneficial when the parallel, monolingual, and test data sources are similar but can be detrimental otherwise, while DAE is less effective than previously reported. Next, we analyze the impact of scale (from 90M to 1.6B parameters) and find it is important for both methods, particularly DAE. As scale increases, DAE transitions from underperforming the parallel-only baseline at 90M to converging with BT performance at 1.6B, and even surpassing it in low-resource. These results offer new insights into how to best use monolingual data in MMT.


page 1

page 2

page 3

page 4


AUGVIC: Exploiting BiText Vicinity for Low-Resource NMT

The success of Neural Machine Translation (NMT) largely depends on the a...

Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data

The scarcity of parallel data is a major obstacle for training high-qual...

Language Model Prior for Low-Resource Neural Machine Translation

The scarcity of large parallel corpora is an important obstacle for neur...

Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation

There are several approaches for improving neural machine translation fo...

Locale Encoding For Scalable Multilingual Keyword Spotting Models

A Multilingual Keyword Spotting (KWS) system detects spokenkeywords over...

The Source-Target Domain Mismatch Problem in Machine Translation

While we live in an increasingly interconnected world, different places ...

Parameter and Data Efficient Continual Pre-training for Robustness to Dialectal Variance in Arabic

The use of multilingual language models for tasks in low and high-resour...

Please sign up or login with your details

Forgot password? Click here to reset