Learning Contextualised Cross-lingual Word Embeddings for Extremely Low-Resource Languages Using Parallel Corpora

10/27/2020
by   Takashi Wada, et al.
0

We propose a new approach for learning contextualised cross-lingual word embeddings based only on a small parallel corpus (e.g. a few hundred sentence pairs). Our method obtains word embeddings via an LSTM-based encoder-decoder model that performs bidirectional translation and reconstruction of the input sentence. Through sharing model parameters among different languages, our model jointly trains the word embeddings in a common multilingual space. We also propose a simple method to combine word and subword embeddings to make use of orthographic similarities across different languages. We base our experiments on real-world data from endangered languages, namely Yongning Na, Shipibo-Konibo and Griko. Our experiments on bilingual lexicon induction and word alignment tasks show that our model outperforms existing methods by a large margin for most language pairs. These results demonstrate that, contrary to common belief, an encoder-decoder translation model is beneficial for learning cross-lingual representations, even in extremely low-resource scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2018

Unsupervised Multilingual Word Embeddings

Multilingual Word Embeddings (MWEs) represent words from multiple langua...
research
05/01/2017

Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary

Cross-lingual model transfer is a compelling and popular method for pred...
research
10/23/2020

Anchor-based Bilingual Word Embeddings for Low-Resource Languages

Bilingual word embeddings (BWEs) are useful for many cross-lingual appli...
research
02/21/2020

Refinement of Unsupervised Cross-Lingual Word Embeddings

Cross-lingual word embeddings aim to bridge the gap between high-resourc...
research
04/07/2022

Automatic WordNet Construction using Word Sense Induction through Sentence Embeddings

Language resources such as wordnets remain indispensable tools for diffe...
research
12/10/2019

Machine Translation with Cross-lingual Word Embeddings

Learning word embeddings using distributional information is a task that...
research
06/21/2019

Learning Bilingual Word Embeddings Using Lexical Definitions

Bilingual word embeddings, which representlexicons of different language...

Please sign up or login with your details

Forgot password? Click here to reset