Unsupervised Cross-lingual Word Embedding by Multilingual Neural Language Models

09/07/2018
by   Takashi Wada, et al.
0

We propose an unsupervised method to obtain cross-lingual embeddings without any parallel data or pre-trained word embeddings. The proposed model, which we call multilingual neural language models, takes sentences of multiple languages as an input. The proposed model contains bidirectional LSTMs that perform as forward and backward language models, and these networks are shared among all the languages. The other parameters, i.e. word embeddings and linear transformation between hidden states and outputs, are specific to each language. The shared LSTMs can capture the common sentence structure among all languages. Accordingly, word embeddings of each language are mapped into a common latent space, making it possible to measure the similarity of words across multiple languages. We evaluate the quality of the cross-lingual word embeddings on a word alignment task. Our experiments demonstrate that our model can obtain cross-lingual embeddings of much higher quality than existing unsupervised models when only a small amount of monolingual data (i.e. 50k sentences) are available, or the domains of monolingual data are different across languages.

READ FULL TEXT
research
08/27/2018

Unsupervised Multilingual Word Embeddings

Multilingual Word Embeddings (MWEs) represent words from multiple langua...
research
11/04/2019

Emerging Cross-lingual Structure in Pretrained Language Models

We study the problem of multilingual masked language modeling, i.e. the ...
research
10/31/2018

Aligning Very Small Parallel Corpora Using Cross-Lingual Word Embeddings and a Monogamy Objective

Count-based word alignment methods, such as the IBM models or fast-align...
research
11/22/2019

Multilingual Culture-Independent Word Analogy Datasets

In text processing, deep neural networks mostly use word embeddings as a...
research
07/18/2020

On a Novel Application of Wasserstein-Procrustes for Unsupervised Cross-Lingual Learning

The emergence of unsupervised word embeddings, pre-trained on very large...
research
06/21/2019

Learning Bilingual Word Embeddings Using Lexical Definitions

Bilingual word embeddings, which representlexicons of different language...
research
04/11/2019

Strong Baselines for Complex Word Identification across Multiple Languages

Complex Word Identification (CWI) is the task of identifying which words...

Please sign up or login with your details

Forgot password? Click here to reset