Synthetic and Natural Noise Both Break Neural Machine Translation

11/06/2017
by   Yonatan Belinkov, et al.
0

Character-based neural machine translation (NMT) models alleviate out-of-vocabulary issues, learn morphology, and move us closer to completely end-to-end translation systems. Unfortunately, they are also very brittle and easily falter when presented with noisy data. In this paper, we confront NMT models with synthetic and natural sources of noise. We find that state-of-the-art models fail to translate even moderately noisy texts that humans have no trouble comprehending. We explore two approaches to increase model robustness: structure-invariant word representations and robust training on noisy texts. We find that a model based on a character convolutional neural network is able to simultaneously learn representations robust to multiple kinds of noise.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2019

Improving Neural Machine Translation Robustness via Data Augmentation: Beyond Back Translation

Neural Machine Translation (NMT) models have been proved strong when tra...
research
05/10/2016

Coverage Embedding Models for Neural Machine Translation

In this paper, we enhance the attention-based neural machine translation...
research
04/17/2018

Improving Character-based Decoding Using Target-Side Morphological Information for Neural Machine Translation

Recently, neural machine translation (NMT) has emerged as a powerful alt...
research
10/24/2021

Noisy UGC Translation at the Character Level: Revisiting Open-Vocabulary Capabilities and Robustness of Char-Based Models

This work explores the capacities of character-based Neural Machine Tran...
research
02/05/2019

Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation

We consider the problem of making machine translation more robust to cha...
research
04/14/2017

How Robust Are Character-Based Word Embeddings in Tagging and MT Against Wrod Scramlbing or Randdm Nouse?

This paper investigates the robustness of NLP against perturbed word for...
research
09/11/2020

Robust Neural Machine Translation: Modeling Orthographic and Interpunctual Variation

Neural machine translation systems typically are trained on curated corp...

Please sign up or login with your details

Forgot password? Click here to reset