Revisiting Language Encoding in Learning Multilingual Representations

02/16/2021
by   Shengjie Luo, et al.
0

Transformer has demonstrated its great power to learn contextual word representations for multiple languages in a single model. To process multilingual sentences in the model, a learnable vector is usually assigned to each language, which is called "language embedding". The language embedding can be either added to the word embedding or attached at the beginning of the sentence. It serves as a language-specific signal for the Transformer to capture contextual representations across languages. In this paper, we revisit the use of language embedding and identify several problems in the existing formulations. By investigating the interaction between language embedding and word embedding in the self-attention module, we find that the current methods cannot reflect the language-specific word correlation well. Given these findings, we propose a new approach called Cross-lingual Language Projection (XLP) to replace language embedding. For a sentence, XLP projects the word embeddings into language-specific semantic space, and then the projected embeddings will be fed into the Transformer model to process with their language-specific meanings. In such a way, XLP achieves the purpose of appropriately encoding "language" in a multilingual Transformer model. Experimental results show that XLP can freely and significantly boost the model performance on extensive multilingual benchmark datasets. Codes and models will be released at https://github.com/lsj2408/XLP.

READ FULL TEXT
research
06/28/2020

Rethinking Positional Encoding in Language Pre-training

How to explicitly encode positional information into neural networks is ...
research
02/27/2023

Elementwise Language Representation

We propose a new technique for computational language representation cal...
research
06/28/2020

Rethinking the Positional Encoding in Language Pre-training

How to explicitly encode positional information into neural networks is ...
research
10/25/2019

Evaluation of Sentence Representations in Polish

Methods for learning sentence representations have been actively develop...
research
07/01/2019

Multilingual, Multi-scale and Multi-layer Visualization of Intermediate Representations

The main alternatives nowadays to deal with sequences are Recurrent Neur...
research
04/16/2022

BLCU-ICALL at SemEval-2022 Task 1: Cross-Attention Multitasking Framework for Definition Modeling

This paper describes the BLCU-ICALL system used in the SemEval-2022 Task...
research
06/22/2022

Modeling Emergent Lexicon Formation with a Self-Reinforcing Stochastic Process

We introduce FiLex, a self-reinforcing stochastic process which models f...

Please sign up or login with your details

Forgot password? Click here to reset