DeepAI
Log In Sign Up

Chinese-Japanese Unsupervised Neural Machine Translation Using Sub-character Level Information

03/01/2019
by   Longtu Zhang, et al.
0

Unsupervised neural machine translation (UNMT) requires only monolingual data of similar language pairs during training and can produce bi-directional translation models with relatively good performance on alphabetic languages (Lample et al., 2018). However, no research has been done to logographic language pairs. This study focuses on Chinese-Japanese UNMT trained by data containing sub-character (ideograph or stroke) level information which is decomposed from character level data. BLEU scores of both character and sub-character level systems were compared against each other and the results showed that despite the effectiveness of UNMT on character level data, sub-character level data could further enhance the performance, in which the stroke level system outperformed the ideograph level system.

READ FULL TEXT

page 1

page 2

page 3

page 4

09/07/2018

Neural Machine Translation of Logographic Languages Using Sub-character Level Information

Recent neural machine translation (NMT) systems have been greatly improv...
11/07/2019

SubCharacter Chinese-English Neural Machine Translation with Wubi encoding

Neural machine translation (NMT) is one of the best methods for understa...
07/20/2017

A Sub-Character Architecture for Korean Language Processing

We introduce a novel sub-character architecture that exploits a unique c...
04/30/2020

Character-Level Translation with Self-attention

We explore the suitability of self-attention models for character-level ...
04/09/2021

Chinese Character Decomposition for Neural MT with Multi-Word Expressions

Chinese character decomposition has been used as a feature to enhance Ma...
09/10/2020

On Target Segmentation for Direct Speech Translation

Recent studies on direct speech translation show continuous improvements...
08/21/2020

Neural Machine Translation without Embeddings

Many NLP models follow the embed-contextualize-predict paradigm, in whic...