Generate, Annotate, and Learn: Generative Models Advance Self-Training and Knowledge Distillation

06/11/2021
by   Xuanli He, et al.
0

Semi-Supervised Learning (SSL) has seen success in many application domains, but this success often hinges on the availability of task-specific unlabeled data. Knowledge distillation (KD) has enabled compressing deep networks and ensembles, achieving the best results when distilling knowledge on fresh task-specific unlabeled examples. However, task-specific unlabeled data can be challenging to find. We present a general framework called "generate, annotate, and learn (GAL)" that uses unconditional generative models to synthesize in-domain unlabeled data, helping advance SSL and KD on different tasks. To obtain strong task-specific generative models, we adopt generic generative models, pretrained on open-domain data, and fine-tune them on inputs from specific tasks. Then, we use existing classifiers to annotate generated unlabeled examples with soft pseudo labels, which are used for additional training. When self-training is combined with samples generated from GPT2-large, fine-tuned on the inputs of each GLUE task, we outperform a strong RoBERTa-large baseline on the GLUE benchmark. Moreover, KD on GPT-2 samples yields a new state-of-the-art for 6-layer transformers on the GLUE leaderboard. Finally, self-training with GAL offers significant gains on image classification on CIFAR-10 and four tabular tasks from the UCI repository

READ FULL TEXT

page 18

page 19

research
10/05/2020

Self-training Improves Pre-training for Natural Language Understanding

Unsupervised pre-training has led to much recent progress in natural lan...
research
06/17/2020

Big Self-Supervised Models are Strong Semi-Supervised Learners

One paradigm for learning from few labeled examples while making best us...
research
10/27/2020

To BERT or Not to BERT: Comparing Task-specific and Task-agnostic Semi-Supervised Approaches for Sequence Tagging

Leveraging large amounts of unlabeled data using Transformer-like archit...
research
04/07/2021

Streaming Self-Training via Domain-Agnostic Unlabeled Images

We present streaming self-training (SST) that aims to democratize the pr...
research
01/03/2022

Which Student is Best? A Comprehensive Knowledge Distillation Exam for Task-Specific BERT Models

We perform knowledge distillation (KD) benchmark from task-specific BERT...
research
03/29/2021

Industry Scale Semi-Supervised Learning for Natural Language Understanding

This paper presents a production Semi-Supervised Learning (SSL) pipeline...
research
11/30/2017

Hybrid VAE: Improving Deep Generative Models using Partial Observations

Deep neural network models trained on large labeled datasets are the sta...

Please sign up or login with your details

Forgot password? Click here to reset