Data-adaptive Transfer Learning for Translation: A Case Study in Haitian and Jamaican

09/13/2022
by   Nathaniel R. Robinson, et al.
0

Multilingual transfer techniques often improve low-resource machine translation (MT). Many of these techniques are applied without considering data characteristics. We show in the context of Haitian-to-English translation that transfer effectiveness is correlated with amount of training data and relationships between knowledge-sharing languages. Our experiments suggest that for some languages beyond a threshold of authentic data, back-translation augmentation methods are counterproductive, while cross-lingual transfer from a sufficiently related language is preferred. We complement this finding by contributing a rule-based French-Haitian orthographic and syntactic engine and a novel method for phonological embedding. When used with multilingual techniques, orthographic transformation makes statistically significant improvements over conventional methods. And in very low-resource Jamaican MT, code-switching with a transfer language for orthographic resemblance yields a 6.63 BLEU point advantage.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2023

Investigating Lexical Sharing in Multilingual Machine Translation for Indian Languages

Multilingual language models have shown impressive cross-lingual transfe...
research
04/08/2020

Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation

There are several approaches for improving neural machine translation fo...
research
05/14/2022

Improving Neural Machine Translation of Indigenous Languages with Multilingual Transfer Learning

Machine translation (MT) involving Indigenous languages, including those...
research
09/10/2021

Rule-based Morphological Inflection Improves Neural Terminology Translation

Current approaches to incorporating terminology constraints in machine t...
research
10/20/2018

Improving Multilingual Semantic Textual Similarity with Shared Sentence Encoder for Low-resource Languages

Measuring the semantic similarity between two sentences (or Semantic Tex...
research
07/03/2021

Can Transformers Jump Around Right in Natural Language? Assessing Performance Transfer from SCAN

Despite their practical success, modern seq2seq architectures are unable...
research
06/22/2023

xSIM++: An Improved Proxy to Bitext Mining Performance for Low-Resource Languages

We introduce a new proxy score for evaluating bitext mining based on sim...

Please sign up or login with your details

Forgot password? Click here to reset