Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation

04/08/2020
by   Stig-Arne Grönroos, et al.
0

There are several approaches for improving neural machine translation for low-resource languages: Monolingual data can be exploited via pretraining or data augmentation; Parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; Subword segmentation and regularization techniques can be applied to ensure high coverage of the vocabulary. We review these approaches in the context of an asymmetric-resource one-to-many translation task, in which the pair of target languages are related, with one being a very low-resource and the other a higher-resource language. We test various methods on three artificially restricted translation tasks—English to Estonian (low-resource) and Finnish (high-resource), English to Slovak and Czech, English to Danish and Swedish—and one real-world task, Norwegian to North Sámi and Finnish. The experiments show positive effects especially for scheduled multi-task learning, denoising autoencoder, and subword sampling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2021

Extremely low-resource machine translation for closely related languages

An effective method to improve extremely low-resource neural machine tra...
research
09/13/2019

Adaptive Scheduling for Multi-Task Learning

To train neural machine translation models simultaneously on multiple ta...
research
09/13/2022

Data-adaptive Transfer Learning for Translation: A Case Study in Haitian and Jamaican

Multilingual transfer techniques often improve low-resource machine tran...
research
04/16/2022

STRATA: Word Boundaries Phoneme Recognition From Continuous Urdu Speech using Transfer Learning, Attention, Data Augmentation

Phoneme recognition is a largely unsolved problem in NLP, especially for...
research
08/21/2019

Improving Captioning for Low-Resource Languages by Cycle Consistency

Improving the captioning performance on low-resource languages by levera...
research
11/26/2021

Ensembling of Distilled Models from Multi-task Teachers for Constrained Resource Language Pairs

This paper describes our submission to the constrained track of WMT21 sh...
research
05/23/2023

When Does Monolingual Data Help Multilingual Translation: The Role of Domain and Model Scale

Multilingual machine translation (MMT), trained on a mixture of parallel...

Please sign up or login with your details

Forgot password? Click here to reset