Bilingual Lexicon Induction with Semi-supervision in Non-Isometric Embedding Spaces

08/19/2019
by   Barun Patra, et al.
0

Recent work on bilingual lexicon induction (BLI) has frequently depended either on aligned bilingual lexicons or on distribution matching, often with an assumption about the isometry of the two spaces. We propose a technique to quantitatively estimate this assumption of the isometry between two embedding spaces and empirically show that this assumption weakens as the languages in question become increasingly etymologically distant. We then propose Bilingual Lexicon Induction with Semi-Supervision (BLISS) --- a semi-supervised approach that relaxes the isometric assumption while leveraging both limited aligned bilingual lexicons and a larger set of unaligned word embeddings, as well as a novel hubness filtering technique. Our proposed method obtains state of the art results on 15 of 18 language pairs on the MUSE dataset, and does particularly well when the embedding spaces don't appear to be isometric. In addition, we also show that adding supervision stabilizes the learning procedure, and is effective even with minimal supervision.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2021

Word Embedding Transformation for Robust Unsupervised Bilingual Lexicon Induction

Great progress has been made in unsupervised bilingual lexicon induction...
research
04/28/2020

LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction Through Non-Linear Mapping in Latent Space

Most of the successful and predominant methods for bilingual lexicon ind...
research
12/31/2020

Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through Context Anchoring

Recent research on cross-lingual word embeddings has been dominated by u...
research
04/04/2019

Density Matching for Bilingual Word Embedding

Recent approaches to cross-lingual word embedding have generally been ba...
research
03/28/2022

Isomorphic Cross-lingual Embeddings for Low-Resource Languages

Cross-Lingual Word Embeddings (CLWEs) are a key component to transfer li...
research
03/03/2021

Lex2vec: making Explainable Word Embedding via Distant Supervision

In this technical report we propose an algorithm, called Lex2vec, that e...
research
10/14/2020

Semi-Supervised Bilingual Lexicon Induction with Two-way Interaction

Semi-supervision is a promising paradigm for Bilingual Lexicon Induction...

Please sign up or login with your details

Forgot password? Click here to reset