CoCoA-MT: A Dataset and Benchmark for Contrastive Controlled MT with Application to Formality

05/09/2022
by   Maria Nădejde, et al.
0

The machine translation (MT) task is typically formulated as that of returning a single translation for an input segment. However, in many cases, multiple different translations are valid and the appropriate translation may depend on the intended target audience, characteristics of the speaker, or even the relationship between speakers. Specific problems arise when dealing with honorifics, particularly translating from English into languages with formality markers. For example, the sentence "Are you sure?" can be translated in German as "Sind Sie sich sicher?" (formal register) or "Bist du dir sicher?" (informal). Using wrong or inconsistent tone may be perceived as inappropriate or jarring for users of certain cultures and demographics. This work addresses the problem of learning to control target language attributes, in this case formality, from a small amount of labeled contrastive data. We introduce an annotated dataset (CoCoA-MT) and an associated evaluation metric for training and evaluating formality-controlled MT models for six diverse target languages. We show that we can train formality-controlled models by fine-tuning on labeled contrastive data, achieving high accuracy (82 while maintaining overall quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2022

MT-GenEval: A Counterfactual and Contextual Dataset for Evaluating Gender Accuracy in Machine Translation

As generic machine translation (MT) quality has improved, the need for t...
research
09/13/2023

Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding

Hallucinations and off-target translation remain unsolved problems in ma...
research
05/26/2023

CODET: A Benchmark for Contrastive Dialectal Evaluation of Machine Translation

Neural machine translation (NMT) systems exhibit limited robustness in h...
research
01/30/2023

Adaptive Machine Translation with Large Language Models

Consistency is a key requirement of high-quality translation. It is espe...
research
05/25/2023

What about em? How Commercial Machine Translation Fails to Handle (Neo-)Pronouns

As 3rd-person pronoun usage shifts to include novel forms, e.g., neopron...
research
05/23/2018

Selecting Machine-Translated Data for Quick Bootstrapping of a Natural Language Understanding System

This paper investigates the use of Machine Translation (MT) to bootstrap...
research
09/15/2021

On the Limits of Minimal Pairs in Contrastive Evaluation

Minimal sentence pairs are frequently used to analyze the behavior of la...

Please sign up or login with your details

Forgot password? Click here to reset