Learning Crosslingual Word Embeddings without Bilingual Corpora

06/30/2016
by   Long Duong, et al.
0

Crosslingual word embeddings represent lexical items from different languages in the same vector space, enabling transfer of NLP tools. However, previous attempts had expensive resource requirements, difficulty incorporating monolingual data or were unable to handle polysemy. We address these drawbacks in our method which takes advantage of a high coverage dictionary in an EM style training algorithm over monolingual corpora in two languages. Our model achieves state-of-the-art performance on bilingual lexicon induction task exceeding models using large bilingual corpora, and competitive results on the monolingual word similarity and cross-lingual document classification task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/01/2017

Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary

Cross-lingual model transfer is a compelling and popular method for pred...
research
10/16/2019

Meemi: Finding the Middle Ground in Cross-lingual Word Embeddings

Word embeddings have become a standard resource in the toolset of any Na...
research
06/25/2017

Beyond Bilingual: Multi-sense Word Embeddings using Multilingual Context

Word embeddings, which represent a word as a point in a vector space, ha...
research
01/17/2020

A Common Semantic Space for Monolingual and Cross-Lingual Meta-Embeddings

This paper presents a new technique for creating monolingual and cross-l...
research
10/31/2018

Aligning Very Small Parallel Corpora Using Cross-Lingual Word Embeddings and a Monogamy Objective

Count-based word alignment methods, such as the IBM models or fast-align...
research
06/21/2019

Learning Bilingual Word Embeddings Using Lexical Definitions

Bilingual word embeddings, which representlexicons of different language...

Please sign up or login with your details

Forgot password? Click here to reset