Aligning Vector-spaces with Noisy Supervised Lexicons

03/25/2019
by   Noa Yehezkel Lubin, et al.
0

The problem of learning to translate between two vector spaces given a set of aligned points arises in several application areas of NLP. Current solutions assume that the lexicon which defines the alignment pairs is noise-free. We consider the case where the set of aligned points is allowed to contain an amount of noise, in the form of incorrect lexicon pairs and show that this arises in practice by analyzing the edited dictionaries after the cleaning process. We demonstrate that such noise substantially degrades the accuracy of the learned translation when using current methods. We propose a model that accounts for noisy pairs. This is achieved by introducing a generative model with a compatible iterative EM algorithm. The algorithm jointly learns the noise level in the lexicon, finds the set of noisy pairs, and learns the mapping between the spaces. We demonstrate the effectiveness of our proposed algorithm on two alignment problems: bilingual word embedding translation, and mapping between diachronic embedding spaces for recovering the semantic shifts of words across time periods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2020

Learning aligned embeddings for semi-supervised word translation using Maximum Mean Discrepancy

Word translation is an integral part of language translation. In machine...
research
05/18/2018

Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces

Recent research has shown that word embedding spaces learned from text c...
research
09/17/2013

Exploiting Similarities among Languages for Machine Translation

Dictionaries and phrase tables are the basis of modern statistical machi...
research
06/29/2021

A Mechanism for Producing Aligned Latent Spaces with Autoencoders

Aligned latent spaces, where meaningful semantic shifts in the input spa...
research
04/28/2020

Conversational Word Embedding for Retrieval-Based Dialog System

Human conversations contain many types of information, e.g., knowledge, ...
research
03/28/2023

Translate the Beauty in Songs: Jointly Learning to Align Melody and Translate Lyrics

Song translation requires both translation of lyrics and alignment of mu...
research
07/31/2020

MSPP: A Highly Efficient and Scalable Algorithm for Mining Similar Pairs of Points

The closest pair of points problem or closest pair problem (CPP) is an i...

Please sign up or login with your details

Forgot password? Click here to reset