Learning Bilingual Word Embeddings Using Lexical Definitions

06/21/2019
by   Weijia Shi, et al.
0

Bilingual word embeddings, which representlexicons of different languages in a shared em-bedding space, are essential for supporting se-mantic and knowledge transfers in a variety ofcross-lingual NLP tasks. Existing approachesto training bilingual word embeddings requireoften require pre-defined seed lexicons that areexpensive to obtain, or parallel sentences thatcomprise coarse and noisy alignment. In con-trast, we propose BilLex that leverages pub-licly available lexical definitions for bilingualword embedding learning. Without the needof predefined seed lexicons, BilLex comprisesa novel word pairing strategy to automati-cally identify and propagate the precise fine-grained word alignment from lexical defini-tions. We evaluate BilLex in word-level andsentence-level translation tasks, which seek tofind the cross-lingual counterparts of wordsand sentences respectively.BilLex signifi-cantly outperforms previous embedding meth-ods on both tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2018

Unsupervised Cross-lingual Word Embedding by Multilingual Neural Language Models

We propose an unsupervised method to obtain cross-lingual embeddings wit...
research
03/08/2019

Context-Aware Cross-Lingual Mapping

Cross-lingual word vectors are typically obtained by fitting an orthogon...
research
08/10/2018

Learning to Represent Bilingual Dictionaries

Bilingual word embeddings have been widely used to capture the similarit...
research
10/31/2018

Aligning Very Small Parallel Corpora Using Cross-Lingual Word Embeddings and a Monogamy Objective

Count-based word alignment methods, such as the IBM models or fast-align...
research
10/27/2020

Learning Contextualised Cross-lingual Word Embeddings for Extremely Low-Resource Languages Using Parallel Corpora

We propose a new approach for learning contextualised cross-lingual word...
research
06/30/2016

Learning Crosslingual Word Embeddings without Bilingual Corpora

Crosslingual word embeddings represent lexical items from different lang...
research
08/06/2023

3D-EX : A Unified Dataset of Definitions and Dictionary Examples

Definitions are a fundamental building block in lexicography, linguistic...

Please sign up or login with your details

Forgot password? Click here to reset