Log In Sign Up

Chinese-Japanese Unsupervised Neural Machine Translation Using Sub-character Level Information

by   Longtu Zhang, et al.

Unsupervised neural machine translation (UNMT) requires only monolingual data of similar language pairs during training and can produce bi-directional translation models with relatively good performance on alphabetic languages (Lample et al., 2018). However, no research has been done to logographic language pairs. This study focuses on Chinese-Japanese UNMT trained by data containing sub-character (ideograph or stroke) level information which is decomposed from character level data. BLEU scores of both character and sub-character level systems were compared against each other and the results showed that despite the effectiveness of UNMT on character level data, sub-character level data could further enhance the performance, in which the stroke level system outperformed the ideograph level system.


page 1

page 2

page 3

page 4


Neural Machine Translation of Logographic Languages Using Sub-character Level Information

Recent neural machine translation (NMT) systems have been greatly improv...

SubCharacter Chinese-English Neural Machine Translation with Wubi encoding

Neural machine translation (NMT) is one of the best methods for understa...

A Sub-Character Architecture for Korean Language Processing

We introduce a novel sub-character architecture that exploits a unique c...

Character-Level Translation with Self-attention

We explore the suitability of self-attention models for character-level ...

Chinese Character Decomposition for Neural MT with Multi-Word Expressions

Chinese character decomposition has been used as a feature to enhance Ma...

On Target Segmentation for Direct Speech Translation

Recent studies on direct speech translation show continuous improvements...

Neural Machine Translation without Embeddings

Many NLP models follow the embed-contextualize-predict paradigm, in whic...