Multilingual Neural Machine Translation With Soft Decoupled Encoding

02/09/2019
by   Xinyi Wang, et al.
0

Multilingual training of neural machine translation (NMT) systems has led to impressive accuracy improvements on low-resource languages. However, there are still significant challenges in efficiently learning word representations in the face of paucity of data. In this paper, we propose Soft Decoupled Encoding (SDE), a multilingual lexicon encoding framework specifically designed to share lexical-level information intelligently without requiring heuristic preprocessing such as pre-segmenting the data. SDE represents a word by its spelling through a character encoding, and its semantic meaning through a latent embedding space shared by all languages. Experiments on a standard dataset of four low-resource languages show consistent improvements over strong multilingual NMT baselines, with gains of up to 2 BLEU on one of the tested languages, achieving the new state-of-the-art on all four language pairs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2020

Improving Target-side Lexical Transfer in Multilingual Neural Machine Translation

To improve the performance of Neural Machine Translation (NMT) for low-r...
research
05/20/2019

Target Conditioned Sampling: Optimizing Data Selection for Multilingual Neural Machine Translation

To improve low-resource Neural Machine Translation (NMT) with multilingu...
research
08/25/2019

Multilingual Neural Machine Translation with Language Clustering

Multilingual neural machine translation (NMT), which translates multiple...
research
05/23/2023

Beyond Shared Vocabulary: Increasing Representational Word Similarities across Languages for Multilingual Machine Translation

Using a shared vocabulary is common practice in Multilingual Neural Mach...
research
02/03/2017

Multilingual Multi-modal Embeddings for Natural Language Processing

We propose a novel discriminative model that learns embeddings from mult...
research
06/22/2023

xSIM++: An Improved Proxy to Bitext Mining Performance for Low-Resource Languages

We introduce a new proxy score for evaluating bitext mining based on sim...
research
10/05/2021

Sicilian Translator: A Recipe for Low-Resource NMT

With 17,000 pairs of Sicilian-English translated sentences, Arba Sicula ...

Please sign up or login with your details

Forgot password? Click here to reset