Building a Multi-domain Neural Machine Translation Model using Knowledge Distillation

04/15/2020
by   Idriss Mghabbar, et al.
0

Lack of specialized data makes building a multi-domain neural machine translation tool challenging. Although emerging literature dealing with low resource languages starts to show promising results, most state-of-the-art models used millions of sentences. Today, the majority of multi-domain adaptation techniques are based on complex and sophisticated architectures that are not adapted for real-world applications. So far, no scalable method is performing better than the simple yet effective mixed-finetuning, i.e finetuning a generic model with a mix of all specialized data and generic data. In this paper, we propose a new training pipeline where knowledge distillation and multiple specialized teachers allow us to efficiently finetune a model without adding new costs at inference time. Our experiments demonstrated that our training pipeline allows improving the performance of multi-domain translation over finetuning in configurations with 2, 3, and 4 domains by up to 2 points in BLEU.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/05/2020

Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation

We explore best practices for training small, memory efficient machine t...
research
08/17/2019

Language Graph Distillation for Low-Resource Machine Translation

Neural machine translation on low-resource language is challenging due t...
research
02/19/2021

Multi-Domain Adaptation in Neural Machine Translation Through Multidimensional Tagging

Many modern Neural Machine Translation (NMT) systems are trained on nonh...
research
10/26/2022

Robust Domain Adaptation for Pre-trained Multilingual Neural Machine Translation Models

Recent literature has demonstrated the potential of multilingual Neural ...
research
09/14/2018

Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation

To better understand the effectiveness of continued training, we analyze...
research
04/19/2023

An Empirical Study of Leveraging Knowledge Distillation for Compressing Multilingual Neural Machine Translation Models

Knowledge distillation (KD) is a well-known method for compressing neura...

Please sign up or login with your details

Forgot password? Click here to reset