Embedding Word Similarity with Neural Machine Translation

12/19/2014
by   Felix Hill, et al.
0

Neural language models learn word representations, or embeddings, that capture rich linguistic and conceptual information. Here we investigate the embeddings learned by neural machine translation models, a recently-developed class of neural language model. We show that embeddings from translation models outperform those learned by monolingual models at tasks that require knowledge of both conceptual similarity and lexical-syntactic role. We further show that these effects hold when translating from both English to French and English to German, and argue that the desirable properties of translation embeddings should emerge largely independently of the source and target languages. Finally, we apply a new method for training neural translation models with very large vocabularies, and show that this vocabulary expansion algorithm results in minimal degradation of embedding quality. Our embedding spaces can be queried in an online demo and downloaded from our web page. Overall, our analyses indicate that translation-based embeddings should be used in applications that require concepts to be organised according to similarity and/or lexical function, while monolingual embeddings are better suited to modelling (nonspecific) inter-word relatedness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2014

Not All Neural Embeddings are Born Equal

Neural language models learn word representations that capture rich ling...
research
08/21/2019

Improving Neural Machine Translation with Pre-trained Representation

Monolingual data has been demonstrated to be helpful in improving the tr...
research
05/30/2020

Data Augmentation for Learning Bilingual Word Embeddings with Unsupervised Machine Translation

Unsupervised bilingual word embedding (BWE) methods learn a linear trans...
research
08/05/2016

Resolving Out-of-Vocabulary Words with Bilingual Embeddings in Machine Translation

Out-of-vocabulary words account for a large proportion of errors in mach...
research
05/09/2017

Word and Phrase Translation with word2vec

Word and phrase tables are key inputs to machine translations, but costl...
research
12/05/2019

Pairwise Neural Machine Translation Evaluation

We present a novel framework for machine translation evaluation using ne...
research
04/18/2021

Embedding-Enhanced Giza++: Improving Alignment in Low- and High- Resource Scenarios Using Embedding Space Geometry

A popular natural language processing task decades ago, word alignment h...

Please sign up or login with your details

Forgot password? Click here to reset