Evaluating Low-Resource Machine Translation between Chinese and Vietnamese with Back-Translation

03/04/2020
by   Hongzheng Li, et al.
0

Back translation (BT) has been widely used and become one of standard techniques for data augmentation in Neural Machine Translation (NMT), BT has proven to be helpful for improving the performance of translation effectively, especially for low-resource scenarios. While most works related to BT mainly focus on European languages, few of them study languages in other areas around the world. In this paper, we investigate the impacts of BT on Asia language translations between the extremely low-resource Chinese and Vietnamese language pair. We evaluate and compare the effects of different sizes of synthetic data on both NMT and Statistical Machine Translation (SMT) models for Chinese to Vietnamese and Vietnamese to Chinese, with character-based and word-based settings. Some conclusions from previous works are partially confirmed and we also draw some other interesting findings and conclusions, which are beneficial to understand BT further.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/18/2017

Neural machine translation for low-resource languages

Neural machine translation (NMT) approaches have improved the state of t...
research
08/12/2020

Approaching Neural Chinese Word Segmentation as a Low-Resource Machine Translation Task

Supervised Chinese word segmentation has been widely approached as seque...
research
04/09/2022

Towards Better Chinese-centric Neural Machine Translation for Low-resource Languages

The last decade has witnessed enormous improvements in science and techn...
research
06/29/2021

Neural Machine Translation for Low-Resource Languages: A Survey

Neural Machine Translation (NMT) has seen a tremendous spurt of growth i...
research
10/05/2019

How Transformer Revitalizes Character-based Neural Machine Translation: An Investigation on Japanese-Vietnamese Translation Systems

While translating between Chinese-centric languages, many works have dis...
research
03/07/2021

Translating the Unseen? Yorùbá → English MT in Low-Resource, Morphologically-Unmarked Settings

Translating between languages where certain features are marked morpholo...
research
10/05/2022

Revisiting Syllables in Language Modelling and their Application on Low-Resource Machine Translation

Language modelling and machine translation tasks mostly use subword or c...

Please sign up or login with your details

Forgot password? Click here to reset