Word and Phrase Translation with word2vec

05/09/2017
by   Stefan Jansen, et al.
0

Word and phrase tables are key inputs to machine translations, but costly to produce. New unsupervised learning methods represent words and phrases in a high-dimensional vector space, and these monolingual embeddings have been shown to encode syntactic and semantic relationships between language elements. The information captured by these embeddings can be exploited for bilingual translation by learning a transformation matrix that allows to match relative positions across two monolingual vector spaces. This method aims to identify high-quality candidates for word and phrase translation more cost-effectively from unlabeled data. This paper expands the scope of previous attempts of bilingual translation to four languages (English, German, Spanish, and French). It shows how to process the source data, train a neural network to learn the high-dimensional embeddings for individual languages and expands the framework for testing their quality beyond the English language. Furthermore, it shows how to learn bilingual transformation matrices and obtain candidates for word and phrase translation, and assess their quality.

READ FULL TEXT

page 2

page 7

page 8

page 9

page 10

research
09/17/2013

Exploiting Similarities among Languages for Machine Translation

Dictionaries and phrase tables are the basis of modern statistical machi...
research
02/05/2015

Beyond Word-based Language Model in Statistical Machine Translation

Language model is one of the most important modules in statistical machi...
research
09/15/2015

Splitting Compounds by Semantic Analogy

Compounding is a highly productive word-formation process in some langua...
research
12/19/2014

Embedding Word Similarity with Neural Machine Translation

Neural language models learn word representations, or embeddings, that c...
research
08/05/2016

Resolving Out-of-Vocabulary Words with Bilingual Embeddings in Machine Translation

Out-of-vocabulary words account for a large proportion of errors in mach...
research
12/19/2014

Leveraging Monolingual Data for Crosslingual Compositional Word Representations

In this work, we present a novel neural network based architecture for i...
research
01/01/2021

Key Phrase Extraction Applause Prediction

With the increase in content availability over the internet it is very d...

Please sign up or login with your details

Forgot password? Click here to reset