Unsupervised Neural Dialect Translation with Commonality and Diversity Modeling

12/11/2019
by   Yu Wan, et al.
0

As a special machine translation task, dialect translation has two main characteristics: 1) lack of parallel training corpus; and 2) possessing similar grammar between two sides of the translation. In this paper, we investigate how to exploit the commonality and diversity between dialects thus to build unsupervised translation models merely accessing to monolingual data. Specifically, we leverage pivot-private embedding, layer coordination, as well as parameter sharing to sufficiently model commonality and diversity among source and target, ranging from lexical, through syntactic, to semantic levels. In order to examine the effectiveness of the proposed models, we collect 20 million monolingual corpus for each of Mandarin and Cantonese, which are official language and the most widely used dialect in China. Experimental results reveal that our methods outperform rule-based simplified and traditional Chinese conversion and conventional unsupervised translation models over 12 BLEU scores.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2020

Reference Language based Unsupervised Neural Machine Translation

Exploiting common language as an auxiliary for better translation has a ...
research
05/30/2020

Data Augmentation for Learning Bilingual Word Embeddings with Unsupervised Machine Translation

Unsupervised bilingual word embedding (BWE) methods learn a linear trans...
research
02/07/2020

A Multilingual View of Unsupervised Machine Translation

We present a probabilistic framework for multilingual neural machine tra...
research
06/05/2019

Deep learning based unsupervised concept unification in the embedding space

Humans are able to conceive physical reality by jointly learning differe...
research
06/01/2022

Exploring Diversity in Back Translation for Low-Resource Machine Translation

Back translation is one of the most widely used methods for improving th...
research
10/17/2018

Sequence to Sequence Mixture Model for Diverse Machine Translation

Sequence to sequence (SEQ2SEQ) models often lack diversity in their gene...
research
09/07/2021

Paraphrase Generation as Unsupervised Machine Translation

In this paper, we propose a new paradigm for paraphrase generation by tr...

Please sign up or login with your details

Forgot password? Click here to reset