DeepAI
Log In Sign Up

Why don't people use character-level machine translation?

10/15/2021
by   Jindřich Libovický, et al.
8

We present a literature and empirical survey that critically assesses the state of the art in character-level modeling for machine translation (MT). Despite evidence in the literature that character-level systems are comparable with subword systems, they are virtually never used in competitive setups in WMT competitions. We empirically show that even with recent modeling innovations in character-level natural language processing, character-level MT systems still struggle to match their subword-based counterparts both in terms of translation quality and training and inference speed. Character-level MT systems show neither better domain robustness, nor better morphological generalization, despite being often so motivated. On the other hand, they tend to be more robust towards source side noise and the translation quality does not degrade with increasing beam size at decoding time.

READ FULL TEXT

page 6

page 7

03/02/2016

Character-based Neural Machine Translation

Neural Machine Translation (MT) has reached state-of-the-art results. Ho...
02/05/2019

Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation

We consider the problem of making machine translation more robust to cha...
04/09/2021

Chinese Character Decomposition for Neural MT with Multi-Word Expressions

Chinese character decomposition has been used as a feature to enhance Ma...
09/04/2019

Problems with automating translation of movie/TV show subtitles

We present 27 problems encountered in automating the translation of movi...
05/25/2022

Machine Translation Robustness to Natural Asemantic Variation

We introduce and formalize an under-studied linguistic phenomenon we cal...
04/29/2020

Towards Character-Level Transformer NMT by Finetuning Subword Systems

Applying the Transformer architecture on the character level usually req...
01/30/2021

Machine Translationese: Effects of Algorithmic Bias on Linguistic Complexity in Machine Translation

Recent studies in the field of Machine Translation (MT) and Natural Lang...