Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation

03/05/2020
by   Mitchell A. Gordon, et al.
0

We explore best practices for training small, memory efficient machine translation models with sequence-level knowledge distillation in the domain adaptation setting. While both domain adaptation and knowledge distillation are widely-used, their interaction remains little understood. Our large-scale empirical results in machine translation (on three language pairs with three domains each) suggest distilling twice for best performance: once using general-domain data and again using in-domain data with an adapted teacher.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/25/2021

Pruning-then-Expanding Model for Domain Adaptation of Neural Machine Translation

Domain Adaptation is widely used in practical applications of neural mac...
research
04/15/2020

Building a Multi-domain Neural Machine Translation Model using Knowledge Distillation

Lack of specialized data makes building a multi-domain neural machine tr...
research
10/23/2020

Rapid Domain Adaptation for Machine Translation with Monolingual Data

One challenge of machine translation is how to quickly adapt to unseen d...
research
05/03/2022

OmniKnight: Multilingual Neural Machine Translation with Language-Specific Self-Distillation

Although all-in-one-model multilingual neural machine translation (MNMT)...
research
12/15/2021

Improving both domain robustness and domain adaptability in machine translation

We address two problems of domain adaptation in neural machine translati...
research
12/16/2019

Iterative Dual Domain Adaptation for Neural Machine Translation

Previous studies on the domain adaptation for neural machine translation...
research
09/14/2018

Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation

To better understand the effectiveness of continued training, we analyze...

Please sign up or login with your details

Forgot password? Click here to reset