Machine Translationese: Effects of Algorithmic Bias on Linguistic Complexity in Machine Translation

01/30/2021
by   Eva Vanmassenhove, et al.
1

Recent studies in the field of Machine Translation (MT) and Natural Language Processing (NLP) have shown that existing models amplify biases observed in the training data. The amplification of biases in language technology has mainly been examined with respect to specific phenomena, such as gender bias. In this work, we go beyond the study of gender in MT and investigate how bias amplification might affect language in a broader sense. We hypothesize that the 'algorithmic bias', i.e. an exacerbation of frequently observed patterns in combination with a loss of less frequent ones, not only exacerbates societal biases present in current datasets but could also lead to an artificially impoverished language: 'machine translationese'. We assess the linguistic richness (on a lexical and morphological level) of translations created by different data-driven MT paradigms - phrase-based statistical (PB-SMT) and neural MT (NMT). Our experiments show that there is a loss of lexical and morphological richness in the translations produced by all investigated MT paradigms for two language pairs (EN<=>FR and EN<=>ES).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2019

Lost in Translation: Loss and Decay of Linguistic Richness in Machine Translation

This work presents an empirical approach to quantifying the loss of lexi...
research
01/24/2023

From Inclusive Language to Gender-Neutral Machine Translation

Gender inclusivity in language has become a central topic of debate and ...
research
04/13/2021

Gender Bias in Machine Translation

Machine translation (MT) technology has facilitated our daily tasks by p...
research
03/31/2020

On the Integration of LinguisticFeatures into Statistical and Neural Machine Translation

New machine translations (MT) technologies are emerging rapidly and with...
research
06/03/2023

Stubborn Lexical Bias in Data and Models

In NLP, recent work has seen increased focus on spurious correlations be...
research
02/06/2017

A Hybrid Approach For Hindi-English Machine Translation

In this paper, an extended combined approach of phrase based statistical...

Please sign up or login with your details

Forgot password? Click here to reset