Synthetic Source Language Augmentation for Colloquial Neural Machine Translation

12/30/2020
by   Asrul Sani Ariesandy, et al.
0

Neural machine translation (NMT) is typically domain-dependent and style-dependent, and it requires lots of training data. State-of-the-art NMT models often fall short in handling colloquial variations of its source language and the lack of parallel data in this regard is a challenging hurdle in systematically improving the existing models. In this work, we develop a novel colloquial Indonesian-English test-set collected from YouTube transcript and Twitter. We perform synthetic style augmentation to the source of formal Indonesian language and show that it improves the baseline Id-En models (in BLEU) over the new test data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/28/2021

A Preordered RNN Layer Boosts Neural Machine Translation in Low Resource Settings

Neural Machine Translation (NMT) models are strong enough to convey sema...
research
10/11/2020

SJTU-NICT's Supervised and Unsupervised Neural Machine Translation Systems for the WMT20 News Translation Task

In this paper, we introduced our joint team SJTU-NICT 's participation i...
research
10/07/2019

Improving Neural Machine Translation Robustness via Data Augmentation: Beyond Back Translation

Neural Machine Translation (NMT) models have been proved strong when tra...
research
11/07/2021

Developing neural machine translation models for Hungarian-English

I train models for the task of neural machine translation for English-Hu...
research
11/17/2022

Reducing Hallucinations in Neural Machine Translation with Feature Attribution

Neural conditional language generation models achieve the state-of-the-a...
research
06/18/2019

Adaptation of Machine Translation Models with Back-translated Data using Transductive Data Selection Methods

Data selection has proven its merit for improving Neural Machine Transla...
research
10/27/2019

Multitask Learning For Different Subword Segmentations In Neural Machine Translation

In Neural Machine Translation (NMT) the usage of subwords and characters...

Please sign up or login with your details

Forgot password? Click here to reset