A Stronger Baseline for Multilingual Word Embeddings

11/01/2018
by   Philipp Dufter, et al.
0

Levy, Søgaard and Goldberg's (2017) S-ID (sentence ID) method applies word2vec on tuples containing a sentence ID and a word from the sentence. It has been shown to be a strong baseline for learning multilingual embeddings. Inspired by recent work on concept based embedding learning we propose SC-ID, an extension to S-ID: given a sentence aligned corpus, we use sampling to extract concepts that are then processed in the same manner as S-IDs. We perform experiments on the Parallel Bible Corpus across 1000+ languages and show that SC-ID yields up to 6 task. In addition, we provide evidence that SC-ID is easily and widely applicable by reporting competitive results across 8 tasks on a EuroParl based corpus.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2021

Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining

Existing models of multilingual sentence embeddings require large parall...
research
10/25/2019

Exploring Multilingual Syntactic Sentence Representations

We study methods for learning sentence embeddings with syntactic structu...
research
08/18/2016

A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments

While cross-lingual word embeddings have been studied extensively in rec...
research
11/06/2019

Analysis and Optimization of Tail-Biting Spatially Coupled Protograph LDPC Codes for BICM-ID Systems

As a typical example of bandwidth-efficient techniques, bit-interleaved ...
research
01/08/2014

Learning Multilingual Word Representations using a Bag-of-Words Autoencoder

Recent work on learning multilingual word representations usually relies...
research
06/14/2023

Does mBERT understand Romansh? Evaluating word embeddings using word alignment

We test similarity-based word alignment models (SimAlign and awesome-ali...
research
07/27/2018

Neural Sentence Embedding using Only In-domain Sentences for Out-of-domain Sentence Detection in Dialog Systems

To ensure satisfactory user experience, dialog systems must be able to d...

Please sign up or login with your details

Forgot password? Click here to reset