Robust Neural Machine Translation with Joint Textual and Phonetic Embedding

10/15/2018
by   Hairong Liu, et al.
0

Neural machine translation (NMT) is notoriously sensitive to noises, but noises are almost inevitable in practice. One special kind of noise is the homophone noise, where words are replaced by other words with the same (or similar) pronunciations. Homophone noise arises frequently from many real-world scenarios upstream to translation, such as automatic speech recognition (ASR) or phonetic-based input systems. We propose to improve the robustness of NMT to homophone noise by 1) jointly embedding both textual and phonetic information of source sentences, and 2) augmenting the training dataset with homophone noise. Interestingly, we found that in order to achieve the best translation quality, most (though not all) weights should be put on the phonetic rather than textual information, where the latter is only used as auxiliary information. Experiments show that our method not only significantly improves the robustness of NMT to homophone noise, which is expected but also surprisingly improves the translation quality on clean test sets.

READ FULL TEXT
research
04/24/2019

Assessing the Tolerance of Neural Machine Translation Systems Against Speech Recognition Errors

Machine translation systems are conventionally trained on textual resour...
research
10/24/2018

Learning to Discriminate Noises for Incorporating External Information in Neural Machine Translation

Previous studies show that incorporating external information could impr...
research
02/05/2019

Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation

We consider the problem of making machine translation more robust to cha...
research
10/20/2020

Word Shape Matters: Robust Machine Translation with Visual Embedding

Neural machine translation has achieved remarkable empirical performance...
research
10/21/2020

Sentence Boundary Augmentation For Neural Machine Translation Robustness

Neural Machine Translation (NMT) models have demonstrated strong state o...
research
11/04/2020

PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on User-Generated Contents

Neural Machine Translation (NMT) has shown drastic improvement in its qu...
research
04/20/2021

Addressing the Vulnerability of NMT in Input Perturbations

Neural Machine Translation (NMT) has achieved significant breakthrough i...

Please sign up or login with your details

Forgot password? Click here to reset