Machine-Created Universal Language for Cross-lingual Transfer

05/22/2023
by   Yaobo Liang, et al.
0

There are two types of approaches to solving cross-lingual transfer: multilingual pre-training implicitly aligns the hidden representations of different languages, while the translate-test explicitly translates different languages to an intermediate language, such as English. Translate-test has better interpretability compared to multilingual pre-training. However, the translate-test has lower performance than multilingual pre-training(Conneau and Lample, 2019; Conneau et al, 2020) and can't solve word-level tasks because translation rearranges the word order. Therefore, we propose a new Machine-created Universal Language (MUL) as a new intermediate language. MUL consists of a set of discrete symbols as universal vocabulary and NL-MUL translator for translating from multiple natural languages to MUL. MUL unifies common concepts from different languages into the same universal word for better cross-language transfer. And MUL preserves the language-specific words as well as word order, so the model can be easily applied to word-level tasks. Our experiments show that translating into MUL achieves better performance compared to multilingual pre-training, and our analyses show that MUL has good interpretability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/03/2019

Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks

We present Unicoder, a universal language encoder that is insensitive to...
research
02/24/2021

Task-Specific Pre-Training and Cross Lingual Transfer for Code-Switched Data

Using task-specific pre-training and leveraging cross-lingual transfer a...
research
05/19/2022

Phylogeny-Inspired Adaptation of Multilingual Models to New Languages

Large pretrained multilingual models, trained on dozens of languages, ha...
research
11/15/2022

ALIGN-MLM: Word Embedding Alignment is Crucial for Multilingual Pre-training

Multilingual pre-trained models exhibit zero-shot cross-lingual transfer...
research
09/15/2021

Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training

Compared to monolingual models, cross-lingual models usually require a m...
research
05/23/2023

Pixel Representations for Multilingual Translation and Data-efficient Cross-lingual Transfer

We introduce and demonstrate how to effectively train multilingual machi...
research
05/31/2021

An Exploratory Analysis of Multilingual Word-Level Quality Estimation with Cross-Lingual Transformers

Most studies on word-level Quality Estimation (QE) of machine translatio...

Please sign up or login with your details

Forgot password? Click here to reset