Controlling Formality in Low-Resource NMT with Domain Adaptation and Re-Ranking: SLT-CDT-UoS at IWSLT2022

05/12/2022
by   Sebastian T. Vincent, et al.
0

This paper describes the SLT-CDT-UoS group's submission to the first Special Task on Formality Control for Spoken Language Translation, part of the IWSLT 2022 Evaluation Campaign. Our efforts were split between two fronts: data engineering and altering the objective function for best hypothesis selection. We used language-independent methods to extract formal and informal sentence pairs from the provided corpora; using English as a pivot language, we propagated formality annotations to languages treated as zero-shot in the task; we also further improved formality controlling with a hypothesis re-ranking approach. On the test sets for English-to-German and English-to-Spanish, we achieved an average accuracy of .935 within the constrained setting and .995 within unconstrained setting. In a zero-shot setting for English-to-Russian and English-to-Italian, we scored average accuracy of .590 for constrained setting and .659 for unconstrained.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2018

Zero-Shot Dual Machine Translation

Neural Machine Translation (NMT) systems rely on large amounts of parall...
research
04/07/2023

BenCoref: A Multi-Domain Dataset of Nominal Phrases and Pronominal Reference Annotations

Coreference Resolution is a well studied problem in NLP. While widely st...
research
03/10/2021

Self-Learning for Zero Shot Neural Machine Translation

Neural Machine Translation (NMT) approaches employing monolingual data a...
research
08/11/2022

Language Tokens: A Frustratingly Simple Approach Improves Zero-Shot Performance of Multilingual Translation

This paper proposes a simple yet effective method to improve direct (X-t...
research
10/31/2022

Very Low Resource Sentence Alignment: Luhya and Swahili

Language-agnostic sentence embeddings generated by pre-trained models su...
research
12/30/2019

Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning

While billions of non-English speaking users rely on search engines ever...
research
03/14/2022

A Neural Pairwise Ranking Model for Readability Assessment

Automatic Readability Assessment (ARA), the task of assigning a reading ...

Please sign up or login with your details

Forgot password? Click here to reset