Do Explicit Alignments Robustly Improve Multilingual Encoders?

10/06/2020
by   Shijie Wu, et al.
0

Multilingual BERT (mBERT), XLM-RoBERTa (XLMR) and other unsupervised multilingual encoders can effectively learn cross-lingual representation. Explicit alignment objectives based on bitexts like Europarl or MultiUN have been shown to further improve these representations. However, word-level alignments are often suboptimal and such bitexts are unavailable for many languages. In this paper, we propose a new contrastive alignment objective that can better utilize such signal, and examine whether these previous alignment methods can be adapted to noisier sources of aligned data: a randomly sampled 1 million pair subset of the OPUS collection. Additionally, rather than report results on a single dataset with a single model run, we report the mean and standard derivation of multiple runs with different seeds, on four datasets and tasks. Our more extensive analysis finds that, while our new objective outperforms previous work, overall these methods do not improve performance with a more robust evaluation framework. Furthermore, the gains from using a better underlying model eclipse any benefits from alignment training. These negative results dictate more care in evaluating these methods and suggest limitations in applying explicit alignment objectives.

READ FULL TEXT

page 11

page 12

research
04/15/2021

Bilingual alignment transfers to multilingual alignment for unsupervised parallel text mining

This work presents methods for learning cross-lingual sentence represent...
research
10/15/2020

Explicit Alignment Objectives for Multilingual Bidirectional Encoders

Pre-trained cross-lingual encoders such as mBERT (Devlin et al., 2019) a...
research
10/10/2022

Multilingual Representation Distillation with Contrastive Learning

Multilingual sentence representations from large models can encode seman...
research
10/10/2019

Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework

Learning multilingual representations of text has proven a successful me...
research
01/21/2021

Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual Retrieval

Pretrained multilingual text encoders based on neural Transformer archit...
research
02/08/2021

SLUA: A Super Lightweight Unsupervised Word Alignment Model via Cross-Lingual Contrastive Learning

Word alignment is essential for the down-streaming cross-lingual languag...

Please sign up or login with your details

Forgot password? Click here to reset