An Empirical Analysis of NMT-Derived Interlingual Embeddings and their Use in Parallel Sentence Identification

04/18/2017
by   Cristina España-Bonet, et al.
0

End-to-end neural machine translation has overtaken statistical machine translation in terms of translation quality for some language pairs, specially those with large amounts of parallel data. Besides this palpable improvement, neural networks provide several new properties. A single system can be trained to translate between many languages at almost no additional cost other than training time. Furthermore, internal representations learned by the network serve as a new semantic representation of words -or sentences- which, unlike standard word embeddings, are learned in an essentially bilingual or even multilingual context. In view of these properties, the contribution of the present work is two-fold. First, we systematically study the NMT context vectors, i.e. output of the encoder, and their power as an interlingua representation of a sentence. We assess their quality and effectiveness by measuring similarities across translations, as well as semantically related and semantically unrelated sentence pairs. Second, as extrinsic evaluation of the first point, we identify parallel sentences in comparable corpora, obtaining an F1=98.2 context vectors jointly with similarity measures F1 reaches 98.9

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2019

Regressing Word and Sentence Embeddings for Regularization of Neural Machine Translation

In recent years, neural machine translation (NMT) has become the dominan...
research
03/29/2018

Identifying Semantic Divergences in Parallel Text without Annotations

Recognizing that even correct translations are not always semantically e...
research
05/16/2018

Are BLEU and Meaning Representation in Opposition?

One of possible ways of obtaining continuous-space sentence representati...
research
04/18/2020

SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings

Word alignments are useful for tasks like statistical and neural machine...
research
02/20/2020

Contextual Lensing of Universal Sentence Representations

What makes a universal sentence encoder universal? The notion of a gener...
research
11/24/2021

Cultural and Geographical Influences on Image Translatability of Words across Languages

Neural Machine Translation (NMT) models have been observed to produce po...
research
09/24/2015

Bilingual Distributed Word Representations from Document-Aligned Comparable Data

We propose a new model for learning bilingual word representations from ...

Please sign up or login with your details

Forgot password? Click here to reset