Towards Character-Level Transformer NMT by Finetuning Subword Systems

04/29/2020 ∙ by Jindřich Libovický, et al. ∙ 0

Applying the Transformer architecture on the character level usually requires very deep architectures that are difficult and slow to train. A few approaches have been proposed that partially overcome this problem by using explicit segmentation into tokens. We show that by initially training a subword model based on this segmentation and then finetuning it on characters, we can obtain a neural machine translation model that works at the character level without requiring segmentation. Without changing the vanilla 6-layer Transformer Base architecture, we train purely character-level models. Our character-level models better capture morphological phenomena and show much higher robustness towards source-side noise at the expense of somewhat worse overall translation quality. Our study is a significant step towards high-performance character-based models that are not extremely large.



There are no comments yet.


page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.