Learning Multilingual Word Representations using a Bag-of-Words Autoencoder

01/08/2014
by   Stanislas Lauly, et al.
0

Recent work on learning multilingual word representations usually relies on the use of word-level alignements (e.g. infered with the help of GIZA++) between translated sentences, in order to align the word embeddings in different languages. In this workshop paper, we investigate an autoencoder model for learning multilingual word representations that does without such word-level alignements. The autoencoder is trained to reconstruct the bag-of-word representation of given sentence from an encoded representation extracted from its translation. We evaluate our approach on a multilingual document classification task, where labeled data is available only for one language (e.g. English) while classification must be performed in a different language (e.g. French). In our experiments, we observe that our method compares favorably with a previously proposed method that exploits word-level alignments to learn word representations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2014

An Autoencoder Approach to Learning Bilingual Word Representations

Cross-language learning allows us to use training data from one language...
research
07/15/2019

GLOSS: Generative Latent Optimization of Sentence Representations

We propose a method to learn unsupervised sentence representations in a ...
research
05/29/2019

Learning Multilingual Word Embeddings Using Image-Text Data

There has been significant interest recently in learning multilingual wo...
research
04/08/2019

Crosslingual Document Embedding as Reduced-Rank Ridge Regression

There has recently been much interest in extending vector-based word rep...
research
01/28/2023

Multilingual Sentence Transformer as A Multilingual Word Aligner

Multilingual pretrained language models (mPLMs) have shown their effecti...
research
11/01/2018

A Stronger Baseline for Multilingual Word Embeddings

Levy, Søgaard and Goldberg's (2017) S-ID (sentence ID) method applies wo...
research
07/01/2019

Multilingual, Multi-scale and Multi-layer Visualization of Intermediate Representations

The main alternatives nowadays to deal with sequences are Recurrent Neur...

Please sign up or login with your details

Forgot password? Click here to reset