Addressing Exposure Bias With Document Minimum Risk Training: Cambridge at the WMT20 Biomedical Translation Task

10/11/2020
by   Danielle Saunders, et al.
0

The 2020 WMT Biomedical translation task evaluated Medline abstract translations. This is a small-domain translation task, meaning limited relevant training data with very distinct style and vocabulary. Models trained on such data are susceptible to exposure bias effects, particularly when training sentence pairs are imperfect translations of each other. This can result in poor behaviour during inference if the model learns to neglect the source sentence. The UNICAM entry addresses this problem during fine-tuning using a robust variant on Minimum Risk Training. We contrast this approach with data-filtering to remove `problem' training examples. Under MRT fine-tuning we obtain good results for both directions of English-German and English-Spanish biomedical translation. In particular we achieve the best English-to-Spanish translation result and second-best Spanish-to-English result, despite using only single models with no ensembling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2020

NEJM-enzh: A Parallel Corpus for English-Chinese Translation in the Biomedical Domain

Machine translation requires large amounts of parallel text. While such ...
research
06/10/2019

The University of Helsinki submissions to the WMT19 news translation task

In this paper, we present the University of Helsinki submissions to the ...
research
11/28/2022

BJTU-WeChat's Systems for the WMT22 Chat Translation Task

This paper introduces the joint submission of the Beijing Jiaotong Unive...
research
10/08/2020

Learning to Evaluate Translation Beyond English: BLEURT Submissions to the WMT Metrics 2020 Shared Task

The quality of machine translation systems has dramatically improved ove...
research
07/15/2019

Facebook FAIR's WMT19 News Translation Task Submission

This paper describes Facebook FAIR's submission to the WMT19 shared news...
research
06/13/2019

UCAM Biomedical translation at WMT19: Transfer learning multi-domain ensembles

The 2019 WMT Biomedical translation task involved translating Medline ab...

Please sign up or login with your details

Forgot password? Click here to reset