Word Shape Matters: Robust Machine Translation with Visual Embedding

10/20/2020
by   Haohan Wang, et al.
6

Neural machine translation has achieved remarkable empirical performance over standard benchmark datasets, yet recent evidence suggests that the models can still fail easily dealing with substandard inputs such as misspelled words, To overcome this issue, we introduce a new encoding heuristic of the input symbols for character-level NLP models: it encodes the shape of each character through the images depicting the letters when printed. We name this new strategy visual embedding and it is expected to improve the robustness of NLP models because humans also process the corpus visually through printed letters, instead of machinery one-hot vectors. Empirically, our method improves models' robustness against substandard inputs, even in the test scenario where the models are tested with the noises that are beyond what is available during the training phase.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/21/2020

Neural Machine Translation without Embeddings

Many NLP models follow the embed-contextualize-predict paradigm, in whic...
research
07/03/2016

Context-Dependent Word Representation for Neural Machine Translation

We first observe a potential weakness of continuous vector representatio...
research
10/15/2018

Robust Neural Machine Translation with Joint Textual and Phonetic Embedding

Neural machine translation (NMT) is notoriously sensitive to noises, but...
research
02/28/2020

Robust Unsupervised Neural Machine Translation with Adversarial Training

Unsupervised neural machine translation (UNMT) has recently attracted gr...
research
10/20/2016

Neural Machine Translation with Characters and Hierarchical Encoding

Most existing Neural Machine Translation models use groups of characters...
research
04/14/2017

How Robust Are Character-Based Word Embeddings in Tagging and MT Against Wrod Scramlbing or Randdm Nouse?

This paper investigates the robustness of NLP against perturbed word for...
research
02/05/2019

Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation

We consider the problem of making machine translation more robust to cha...

Please sign up or login with your details

Forgot password? Click here to reset