BilBOWA: Fast Bilingual Distributed Representations without Word Alignments

10/09/2014
by   Stephan Gouws, et al.
0

We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data. Instead it trains directly on monolingual data and extracts a bilingual signal from a smaller set of raw-text sentence-aligned data. This is achieved using a novel sampled bag-of-words cross-lingual objective, which is used to regularize two noise-contrastive language models for efficient cross-lingual feature learning. We show that bilingual embeddings learned using the proposed model outperform state-of-the-art methods on a cross-lingual document classification task as well as a lexical translation task on WMT11 data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/11/2016

Trans-gram, Fast Cross-lingual Word-embeddings

We introduce Trans-gram, a simple and computationally-efficient method t...
research
12/28/2019

Robust Cross-lingual Embeddings from Parallel Sentences

Recent advances in cross-lingual word embeddings have primarily relied o...
research
08/09/2016

Towards cross-lingual distributed representations without parallel text trained with adversarial autoencoders

Current approaches to learning vector representations of text that are c...
research
04/11/2019

Strong Baselines for Complex Word Identification across Multiple Languages

Complex Word Identification (CWI) is the task of identifying which words...
research
09/19/2022

ALEXSIS-PT: A New Resource for Portuguese Lexical Simplification

Lexical simplification (LS) is the task of automatically replacing compl...
research
10/06/2020

Cross-Lingual Text Classification with Minimal Resources by Transferring a Sparse Teacher

Cross-lingual text classification alleviates the need for manually label...
research
05/25/2022

Language Anisotropic Cross-Lingual Model Editing

Pre-trained language models learn large amounts of knowledge from their ...

Please sign up or login with your details

Forgot password? Click here to reset