Revisiting Self-Training for Neural Sequence Generation

09/30/2019
by   Junxian He, et al.
0

Self-training is one of the earliest and simplest semi-supervised methods. The key idea is to augment the original labeled dataset with unlabeled data paired with the model's prediction (i.e. pseudo-parallel data). While self-training has been extensively studied on classification problems, in complex sequence generation tasks (e.g. machine translation) it is still unclear how self-training works due to the compositionality of the target space. In this work, we first empirically show that self-training is able to decently improve the supervised baseline on neural sequence generation tasks. Through careful examination of the performance gains, we find that the perturbation on the hidden states (i.e. dropout) is critical for self-training to benefit from the pseudo-parallel data, which acts as a regularizer and forces the model to yield close predictions for similar unlabeled inputs. Such effect helps the model correct some incorrect predictions on unlabeled data. To further encourage this mechanism, we propose to inject noise to the input space, resulting in a "noisy" version of self-training. Empirical study on standard machine translation and text summarization benchmarks shows that noisy self-training is able to effectively utilize unlabeled data and improve the performance of the supervised baseline by a large margin.

READ FULL TEXT

page 3

page 6

page 13

research
01/18/2023

Enhancing Self-Training Methods

Semi-supervised learning approaches train on small sets of labeled data ...
research
08/15/2023

Boosting Semi-Supervised Learning by bridging high and low-confidence predictions

Pseudo-labeling is a crucial technique in semi-supervised learning (SSL)...
research
04/23/2020

Semi-Supervised Models via Data Augmentationfor Classifying Interactive Affective Responses

We present semi-supervised models with data augmentation (SMDA), a semi-...
research
12/21/2020

Out-distribution aware Self-training in an Open World Setting

Deep Learning heavily depends on large labeled datasets which limits fur...
research
10/22/2019

Class Mean Vectors, Self Monitoring and Self Learning for Neural Classifiers

In this paper we explore the role of sample mean in building a neural ne...
research
04/21/2022

SelfD: Self-Learning Large-Scale Driving Policies From the Web

Effectively utilizing the vast amounts of ego-centric navigation data th...
research
01/20/2023

Towards Understanding How Self-training Tolerates Data Backdoor Poisoning

Recent studies on backdoor attacks in model training have shown that pol...

Please sign up or login with your details

Forgot password? Click here to reset