ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora

by   Xuan Ouyang, et al.

Recent studies have demonstrated that pre-trained cross-lingual models achieve impressive performance on downstream cross-lingual tasks. This improvement stems from the learning of a large amount of monolingual and parallel corpora. While it is generally acknowledged that parallel corpora are critical for improving the model performance, existing methods are often constrained by the size of parallel corpora, especially for the low-resource languages. In this paper, we propose ERNIE-M, a new training method that encourages the model to align the representation of multiple languages with monolingual corpora, to break the constraint of parallel corpus size on the model performance. Our key insight is to integrate the idea of back translation in the pre-training process. We generate pseudo-parallel sentences pairs on a monolingual corpus to enable the learning of semantic alignment between different languages, which enhances the semantic modeling of cross-lingual models. Experimental results show that ERNIE-M outperforms existing cross-lingual models and delivers new state-of-the-art results on various cross-lingual downstream tasks. The codes and pre-trained models will be made publicly available.


page 1

page 2

page 3

page 4


Multi-Level Contrastive Learning for Cross-Lingual Alignment

Cross-language pre-trained models such as multilingual BERT (mBERT) have...

Low-Resource Sequence Labeling via Unsupervised Multilingual Contextualized Representations

Previous work on cross-lingual sequence labeling tasks either requires p...

Aligning Cross-lingual Sentence Representations with Dual Momentum Contrast

In this paper, we propose to align sentence representations from differe...

Language Anisotropic Cross-Lingual Model Editing

Pre-trained language models learn large amounts of knowledge from their ...

Language comparison via network topology

Modeling relations between languages can offer understanding of language...

Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese

Multilingual language models such as mBERT have seen impressive cross-li...

Do Multi-Lingual Pre-trained Language Models Reveal Consistent Token Attributions in Different Languages?

During the past several years, a surge of multi-lingual Pre-trained Lang...