Bilingual alignment transfers to multilingual alignment for unsupervised parallel text mining

04/15/2021
by   Chih-chan Tien, et al.
7

This work presents methods for learning cross-lingual sentence representations using paired or unpaired bilingual texts. We hypothesize that the cross-lingual alignment strategy is transferable, and therefore a model trained to align only two languages can encode multilingually more aligned representations. And such transfer from bilingual alignment to multilingual alignment is a dual-pivot transfer from two pivot languages to other language pairs. To study this theory, we train an unsupervised model with unpaired sentences and another single-pair supervised model with bitexts, both based on the unsupervised language model XLM-R. The experiments evaluate the models as universal sentence encoders on the task of unsupervised bitext mining on two datasets, where the unsupervised model reaches the state of the art of unsupervised retrieval, and the alternative single-pair supervised model approaches the performance of multilingually supervised models. The results suggest that bilingual training techniques as proposed can be applied to get sentence representations with higher multilingual alignment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/08/2020

Globetrotter: Unsupervised Multilingual Translation from Visual Alignment

Multi-language machine translation without parallel corpora is challengi...
research
10/15/2020

Explicit Alignment Objectives for Multilingual Bidirectional Encoders

Pre-trained cross-lingual encoders such as mBERT (Devlin et al., 2019) a...
research
10/06/2020

Do Explicit Alignments Robustly Improve Multilingual Encoders?

Multilingual BERT (mBERT), XLM-RoBERTa (XLMR) and other unsupervised mul...
research
01/31/2022

Constrained Density Matching and Modeling for Cross-lingual Alignment of Contextualized Representations

Multilingual representations pre-trained with monolingual data exhibit c...
research
08/21/2021

Metric Learning in Multilingual Sentence Similarity Measurement for Document Alignment

Document alignment techniques based on multilingual sentence representat...
research
07/19/2022

Multilingual Transformer Encoders: a Word-Level Task-Agnostic Evaluation

Some Transformer-based models can perform cross-lingual transfer learnin...
research
12/28/2020

Universal Sentence Representation Learning with Conditional Masked Language Model

This paper presents a novel training method, Conditional Masked Language...

Please sign up or login with your details

Forgot password? Click here to reset