Log In Sign Up

Fully Character-Level Neural Machine Translation without Explicit Segmentation

by   Jason Lee, et al.

Most existing machine translation systems operate at the level of words, relying on explicit segmentation to extract tokens. We introduce a neural machine translation (NMT) model that maps a source character sequence to a target character sequence without any segmentation. We employ a character-level convolutional network with max-pooling at the encoder to reduce the length of source representation, allowing the model to be trained at a speed comparable to subword-level models while capturing local regularities. Our character-to-character model outperforms a recently proposed baseline with a subword-level encoder on WMT'15 DE-EN and CS-EN, and gives comparable performance on FI-EN and RU-EN. We then demonstrate that it is possible to share a single character-level encoder across multiple languages by training a model on a many-to-one translation task. In this multilingual setting, the character-level encoder significantly outperforms the subword-level encoder on all the language pairs. We observe that on CS-EN, FI-EN and RU-EN, the quality of the multilingual character-level translation even surpasses the models specifically trained on that language pair alone, both in terms of BLEU score and human judgment.


page 1

page 2

page 3

page 4


A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation

The existing machine translation systems, whether phrase-based or neural...

Character-level Transformer-based Neural Machine Translation

Neural machine translation (NMT) is nowadays commonly applied at the sub...

On Target Segmentation for Direct Speech Translation

Recent studies on direct speech translation show continuous improvements...

Towards Neural Machine Translation with Latent Tree Attention

Building models that take advantage of the hierarchical structure of lan...

Neural Machine Translation in Linear Time

We present a novel neural network for processing sequences. The ByteNet ...

Plan, Attend, Generate: Character-level Neural Machine Translation with Planning in the Decoder

We investigate the integration of a planning mechanism into an encoder-d...

Neural Machine Translation without Embeddings

Many NLP models follow the embed-contextualize-predict paradigm, in whic...

Code Repositories