Acquiring Knowledge from Pre-trained Model to Neural Machine Translation

12/04/2019
by   Rongxiang Weng, et al.
0

Pre-training and fine-tuning have achieved great success in the natural language process field. The standard paradigm of exploiting them includes two steps: first, pre-training a model, e.g. BERT, with a large scale unlabeled monolingual data. Then, fine-tuning the pre-trained model with labeled data from downstream tasks. However, in neural machine translation (NMT), we address the problem that the training objective of the bilingual task is far different from the monolingual pre-trained model. This gap leads that only using fine-tuning in NMT can not fully utilize prior language knowledge. In this paper, we propose an APT framework for acquiring knowledge from the pre-trained model to NMT. The proposed approach includes two modules: 1). a dynamic fusion mechanism to fuse task-specific features adapted from general knowledge into NMT network, 2). a knowledge distillation paradigm to learn language knowledge continuously during the NMT training process. The proposed approach could integrate suitable knowledge from pre-trained models to improve the NMT. Experimental results on WMT English to German, German to English and Chinese to English machine translation tasks show that our model outperforms strong baselines and the fine-tuning counterparts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2019

Towards Making the Most of BERT in Neural Machine Translation

GPT-2 and BERT demonstrate the effectiveness of using pre-trained langua...
research
04/30/2020

Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation

There has been recent success in pre-training on monolingual data and fi...
research
11/16/2022

TSMind: Alibaba and Soochow University's Submission to the WMT22 Translation Suggestion Task

This paper describes the joint submission of Alibaba and Soochow Univers...
research
09/07/2021

IndicBART: A Pre-trained Model for Natural Language Generation of Indic Languages

In this paper we present IndicBART, a multilingual, sequence-to-sequence...
research
08/25/2021

YANMTT: Yet Another Neural Machine Translation Toolkit

In this paper we present our open-source neural machine translation (NMT...
research
10/12/2022

Using Massive Multilingual Pre-Trained Language Models Towards Real Zero-Shot Neural Machine Translation in Clinical Domain

Massively multilingual pre-trained language models (MMPLMs) are develope...
research
11/25/2019

JParaCrawl: A Large Scale Web-Based English-Japanese Parallel Corpus

Recent machine translation algorithms mainly rely on parallel corpora. H...

Please sign up or login with your details

Forgot password? Click here to reset