Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework

10/10/2019
by   Zirui Wang, et al.
0

Learning multilingual representations of text has proven a successful method for many cross-lingual transfer learning tasks. There are two main paradigms for learning such representations: (1) alignment, which maps different independently trained monolingual representations into a shared space, and (2) joint training, which directly learns unified multilingual representations using monolingual and cross-lingual objectives jointly. In this paper, we first conduct direct comparisons of representations learned using both of these methods across diverse cross-lingual tasks. Our empirical results reveal a set of pros and cons for both methods, and show that the relative performance of alignment versus joint training is task-dependent. Stemming from this analysis, we propose a simple and novel framework that combines these two previously mutually-exclusive approaches. Extensive experiments on various tasks demonstrate that our proposed framework alleviates limitations of both approaches, and outperforms existing methods on the MUSE bilingual lexicon induction (BLI) benchmark. We further show that our proposed framework can generalize to contextualized representations and achieves state-of-the-art results on the CoNLL cross-lingual NER benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2021

Cross-lingual Transfer of Monolingual Models

Recent studies in zero-shot cross-lingual learning using multilingual mo...
research
08/27/2018

Improving Cross-Lingual Word Embeddings by Meeting in the Middle

Cross-lingual word embeddings are becoming increasingly important in mul...
research
07/04/2022

Unify and Conquer: How Phonetic Feature Representation Affects Polyglot Text-To-Speech (TTS)

An essential design decision for multilingual Neural Text-To-Speech (NTT...
research
12/15/2021

Learning Cross-Lingual IR from an English Retriever

We present a new cross-lingual information retrieval (CLIR) model traine...
research
04/07/2023

InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling

Cross-lingual topic models have been prevalent for cross-lingual text an...
research
10/06/2020

Do Explicit Alignments Robustly Improve Multilingual Encoders?

Multilingual BERT (mBERT), XLM-RoBERTa (XLMR) and other unsupervised mul...
research
11/28/2014

Coarse-grained Cross-lingual Alignment of Comparable Texts with Topic Models and Encyclopedic Knowledge

We present a method for coarse-grained cross-lingual alignment of compar...

Please sign up or login with your details

Forgot password? Click here to reset