On the Difficulty of Translating Free-Order Case-Marking Languages

07/13/2021
by   Arianna Bisazza, et al.
0

Identifying factors that make certain languages harder to model than others is essential to reach language equality in future Natural Language Processing technologies. Free-order case-marking languages, such as Russian, Latin or Tamil, have proved more challenging than fixed-order languages for the tasks of syntactic parsing and subject-verb agreement prediction. In this work, we investigate whether this class of languages is also more difficult to translate by state-of-the-art Neural Machine Translation models (NMT). Using a variety of synthetic languages and a newly introduced translation challenge set, we find that word order flexibility in the source language only leads to a very small loss of NMT quality, even though the core verb arguments become impossible to disambiguate in sentences without semantic cues. The latter issue is indeed solved by the addition of case marking. However, in medium- and low-resource settings, the overall NMT quality of fixed-order languages remains unmatched.

READ FULL TEXT
research
08/18/2017

Neural machine translation for low-resource languages

Neural machine translation (NMT) approaches have improved the state of t...
research
08/25/2023

Ngambay-French Neural Machine Translation (sba-Fr)

In Africa, and the world at large, there is an increasing focus on devel...
research
11/01/2018

Addressing word-order Divergence in Multilingual Neural Machine Translation for extremely Low Resource Languages

Transfer learning approaches for Neural Machine Translation (NMT) train ...
research
04/19/2023

The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource Languages

Efficiently and accurately translating a corpus into a low-resource lang...
research
05/17/2020

Encodings of Source Syntax: Similarities in NMT Representations Across Target Languages

We train neural machine translation (NMT) models from English to six tar...
research
01/23/2021

On the Evolution of Word Order

Most natural languages have a predominant or fixed word order. For examp...
research
01/30/2022

Grammatical cues are largely, but not completely, redundant with word meanings in natural language

The combinatorial power of language has historically been argued to be e...

Please sign up or login with your details

Forgot password? Click here to reset