Reciprocal Supervised Learning Improves Neural Machine Translation

12/05/2020
by   Minkai Xu, et al.
0

Despite the recent success on image classification, self-training has only achieved limited gains on structured prediction tasks such as neural machine translation (NMT). This is mainly due to the compositionality of the target space, where the far-away prediction hypotheses lead to the notorious reinforced mistake problem. In this paper, we revisit the utilization of multiple diverse models and present a simple yet effective approach named Reciprocal-Supervised Learning (RSL). RSL first exploits individual models to generate pseudo parallel data, and then cooperatively trains each model on the combined synthetic corpus. RSL leverages the fact that different parameterized models have different inductive biases, and better predictions can be made by jointly exploiting the agreement among each other. Unlike the previous knowledge distillation methods built upon a much stronger teacher, RSL is capable of boosting the accuracy of one model by introducing other comparable or even weaker models. RSL can also be viewed as a more efficient alternative to ensemble. Extensive experiments demonstrate the superior performance of RSL on several benchmarks with significant margins.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2023

Pseudo-Label Training and Model Inertia in Neural Machine Translation

Like many other machine learning applications, neural machine translatio...
research
02/06/2017

Ensemble Distillation for Neural Machine Translation

Knowledge distillation describes a method for training a student network...
research
05/27/2021

Selective Knowledge Distillation for Neural Machine Translation

Neural Machine Translation (NMT) models achieve state-of-the-art perform...
research
04/02/2017

Building a Neural Machine Translation System Using Only Synthetic Parallel Data

Recent works have shown that synthetic parallel data automatically gener...
research
12/31/2020

Exploring Monolingual Data for Neural Machine Translation with Knowledge Distillation

We explore two types of monolingual data that can be included in knowled...
research
05/29/2018

Distilling Knowledge for Search-based Structured Prediction

Many natural language processing tasks can be modeled into structured pr...
research
08/17/2023

Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach

Text recognition methods are gaining rapid development. Some advanced te...

Please sign up or login with your details

Forgot password? Click here to reset