Language Representation in Multilingual BERT and its applications to improve Cross-lingual Generalization

10/20/2020
by   Chi-Liang Liu, et al.
21

A token embedding in multilingual BERT (m-BERT) contains both language and semantic information. We find that representation of a language can be obtained by simply averaging the embeddings of the tokens of the language. With the language representation, we can control the output languages of multilingual BERT by manipulating the token embeddings and achieve unsupervised token translation. We further propose a computationally cheap but effective approach to improve the cross-lingual ability of m-BERT based on the observation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2020

A Study of Cross-Lingual Ability and Language-specific Information in Multilingual BERT

Recently, multilingual BERT works remarkably well on cross-lingual trans...
research
11/04/2020

Probing Multilingual BERT for Genetic and Typological Signals

We probe the layers in multilingual BERT (mBERT) for phylogenetic and ge...
research
09/11/2021

The Impact of Positional Encodings on Multilingual Compression

In order to preserve word-order information in a non-autoregressive sett...
research
03/10/2022

A new approach to calculating BERTScore for automatic assessment of translation quality

The study of the applicability of the BERTScore metric was conducted to ...
research
07/04/2022

Unify and Conquer: How Phonetic Feature Representation Affects Polyglot Text-To-Speech (TTS)

An essential design decision for multilingual Neural Text-To-Speech (NTT...
research
10/16/2020

It's not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT

Recent works have demonstrated that multilingual BERT (mBERT) learns ric...
research
05/01/2020

Identifying Necessary Elements for BERT's Multilinguality

It has been shown that multilingual BERT (mBERT) yields high quality mul...

Please sign up or login with your details

Forgot password? Click here to reset