Towards cross-lingual distributed representations without parallel text trained with adversarial autoencoders

Current approaches to learning vector representations of text that are compatible between different languages usually require some amount of parallel text, aligned at word, sentence or at least document level. We hypothesize however, that different natural languages share enough semantic structure that it should be possible, in principle, to learn compatible vector representations just by analyzing the monolingual distribution of words. In order to evaluate this hypothesis, we propose a scheme to map word vectors trained on a source language to vectors semantically compatible with word vectors trained on a target language using an adversarial autoencoder. We present preliminary qualitative results and discuss possible future developments of this technique, such as applications to cross-lingual sentence representations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2021

Training Cross-Lingual embeddings for Setswana and Sepedi

African languages still lag in the advances of Natural Language Processi...
research
03/08/2019

Context-Aware Cross-Lingual Mapping

Cross-lingual word vectors are typically obtained by fitting an orthogon...
research
02/06/2014

An Autoencoder Approach to Learning Bilingual Word Representations

Cross-language learning allows us to use training data from one language...
research
10/09/2014

BilBOWA: Fast Bilingual Distributed Representations without Word Alignments

We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simp...
research
06/13/2019

A Comparison of Word-based and Context-based Representations for Classification Problems in Health Informatics

Distributed representations of text can be used as features when trainin...
research
07/25/2017

Analogs of Linguistic Structure in Deep Representations

We investigate the compositional structure of message vectors computed b...
research
02/13/2014

Squeezing bottlenecks: exploring the limits of autoencoder semantic representation capabilities

We present a comprehensive study on the use of autoencoders for modellin...

Please sign up or login with your details

Forgot password? Click here to reset