Aligning Very Small Parallel Corpora Using Cross-Lingual Word Embeddings and a Monogamy Objective

10/31/2018
by   Nina Poerner, et al.
0

Count-based word alignment methods, such as the IBM models or fast-align, struggle on very small parallel corpora. We therefore present an alternative approach based on cross-lingual word embeddings (CLWEs), which are trained on purely monolingual data. Our main contribution is an unsupervised objective to adapt CLWEs to parallel corpora. In experiments on between 25 and 500 sentences, our method outperforms fast-align. We also show that our fine-tuning objective consistently improves a CLWE-only baseline.

READ FULL TEXT

page 4

page 5

research
09/07/2018

Unsupervised Cross-lingual Word Embedding by Multilingual Neural Language Models

We propose an unsupervised method to obtain cross-lingual embeddings wit...
research
01/20/2021

Word Alignment by Fine-tuning Embeddings on Parallel Corpora

Word alignment over parallel corpora has a wide variety of applications,...
research
12/28/2019

Robust Cross-lingual Embeddings from Parallel Sentences

Recent advances in cross-lingual word embeddings have primarily relied o...
research
05/16/2018

A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings

Recent work has managed to learn cross-lingual word embeddings without p...
research
06/21/2019

Learning Bilingual Word Embeddings Using Lexical Definitions

Bilingual word embeddings, which representlexicons of different language...
research
06/30/2016

Learning Crosslingual Word Embeddings without Bilingual Corpora

Crosslingual word embeddings represent lexical items from different lang...
research
12/29/2017

Detecting Cross-Lingual Plagiarism Using Simulated Word Embeddings

Cross-lingual plagiarism (CLP) occurs when texts written in one language...

Please sign up or login with your details

Forgot password? Click here to reset