A Call for More Rigor in Unsupervised Cross-lingual Learning

04/30/2020
by   Mikel Artetxe, et al.
6

We review motivations, definition, approaches, and methodology for unsupervised cross-lingual learning and call for a more rigorous position in each of them. An existing rationale for such research is based on the lack of parallel data for many of the world's languages. However, we argue that a scenario without any parallel data and abundant monolingual data is unrealistic in practice. We also discuss different training signals that have been used in previous work, which depart from the pure unsupervised setting. We then describe common methodological issues in tuning and evaluation of unsupervised cross-lingual models and present best practices. Finally, we provide a unified outlook for different types of research in this area (i.e., cross-lingual word embeddings, deep multilingual pretraining, and unsupervised machine translation) and argue for comparable evaluation of these models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2022

On the Role of Parallel Data in Cross-lingual Transfer Learning

While prior work has established that the use of parallel data is conduc...
research
09/19/2018

Unsupervised cross-lingual matching of product classifications

Unsupervised cross-lingual embeddings mapping has provided a unique tool...
research
07/24/2018

Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!

Argumentation mining (AM) requires the identification of complex discour...
research
12/17/2020

The effectiveness of unsupervised subword modeling with autoregressive and cross-lingual phone-aware networks

This study addresses unsupervised subword modeling, i.e., learning acous...
research
05/16/2018

A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings

Recent work has managed to learn cross-lingual word embeddings without p...
research
12/05/2015

Unsupervised comparable corpora preparation and exploration for bi-lingual translation equivalents

The multilingual nature of the world makes translation a crucial require...

Please sign up or login with your details

Forgot password? Click here to reset