Translating Similar Languages: Role of Mutual Intelligibility in Multilingual Transformers

11/10/2020
by   Ife Adebara, et al.
0

We investigate different approaches to translate between similar languages under low resource conditions, as part of our contribution to the WMT 2020 Similar Languages Translation Shared Task. We submitted Transformer-based bilingual and multilingual systems for all language pairs, in the two directions. We also leverage back-translation for one of the language pairs, acquiring an improvement of more than 3 BLEU points. We interpret our results in light of the degree of mutual intelligibility (based on Jaccard similarity) between each pair, finding a positive correlation between mutual intelligibility and model performance. Our Spanish-Catalan model has the best performance of all the five language pairs. Except for the case of Hindi-Marathi, our bilingual models achieve better performance than the multilingual models on all pairs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2021

CUNI systems for WMT21: Multilingual Low-Resource Translation for Indo-European Languages Shared Task

This paper describes Charles University submission for Multilingual Low-...
research
08/23/2022

MATra: A Multilingual Attentive Transliteration System for Indian Scripts

Transliteration is a task in the domain of NLP where the output word is ...
research
08/07/2021

Improving Similar Language Translation With Transfer Learning

We investigate transfer learning based on pre-trained neural machine tra...
research
01/31/2022

Are Mutually Intelligible Languages Easier to Translate?

Two languages are considered mutually intelligible if their native speak...
research
11/21/2016

False-Friend Detection and Entity Matching via Unsupervised Transliteration

Transliterations play an important role in multilingual entity reference...
research
10/20/2018

Improving Multilingual Semantic Textual Similarity with Shared Sentence Encoder for Low-resource Languages

Measuring the semantic similarity between two sentences (or Semantic Tex...
research
04/12/2021

Family of Origin and Family of Choice: Massively Parallel Lexiconized Iterative Pretraining for Severely Low Resource Machine Translation

We translate a closed text that is known in advance into a severely low ...

Please sign up or login with your details

Forgot password? Click here to reset