On the Integration of LinguisticFeatures into Statistical and Neural Machine Translation

03/31/2020
by   Eva Vanmassenhove, et al.
0

New machine translations (MT) technologies are emerging rapidly and with them, bold claims of achieving human parity such as: (i) the results produced approach "accuracy achieved by average bilingual human translators" (Wu et al., 2017b) or (ii) the "translation quality is at human parity when compared to professional human translators" (Hassan et al., 2018) have seen the light of day (Laubli et al., 2018). Aside from the fact that many of these papers craft their own definition of human parity, these sensational claims are often not supported by a complete analysis of all aspects involved in translation. Establishing the discrepancies between the strengths of statistical approaches to MT and the way humans translate has been the starting point of our research. By looking at MT output and linguistic theory, we were able to identify some remaining issues. The problems range from simple number and gender agreement errors to more complex phenomena such as the correct translation of aspectual values and tenses. Our experiments confirm, along with other studies (Bentivogli et al., 2016), that neural MT has surpassed statistical MT in many aspects. However, some problems remain and others have emerged. We cover a series of problems related to the integration of specific linguistic features into statistical and neural MT, aiming to analyse and provide a solution to some of them. Our work focuses on addressing three main research questions that revolve around the complex relationship between linguistics and MT in general. We identify linguistic information that is lacking in order for automatic translation systems to produce more accurate translations and integrate additional features into the existing pipelines. We identify overgeneralization or 'algorithmic bias' as a potential drawback of neural MT and link it to many of the remaining linguistic issues.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

08/30/2018

Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation

We reassess a recent study (Hassan et al., 2018) that claimed that machi...
04/03/2020

A Set of Recommendations for Assessing Human-Machine Parity in Language Translation

The quality of machine translation has increased remarkably over the pas...
01/30/2021

Machine Translationese: Effects of Algorithmic Bias on Linguistic Complexity in Machine Translation

Recent studies in the field of Machine Translation (MT) and Natural Lang...
06/24/2019

Translationese in Machine Translation Evaluation

The term translationese has been used to describe the presence of unusua...
05/27/2020

MT-Adapted Datasheets for Datasets: Template and Repository

In this report we are taking the standardized model proposed by Gebru et...
11/30/2020

Machine Translation of Novels in the Age of Transformer

In this chapter we build a machine translation (MT) system tailored to t...
08/10/2015

Removing Biases from Trainable MT Metrics by Using Self-Training

Most trainable machine translation (MT) metrics train their weights on h...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.