DisCo: Distilled Student Models Co-training for Semi-supervised Text Mining

05/20/2023
by   Weifeng Jiang, et al.
0

Many text mining models are constructed by fine-tuning a large deep pre-trained language model (PLM) in downstream tasks. However, a significant challenge is maintaining performance when we use a lightweight model with limited labeled samples. We present DisCo, a semi-supervised learning (SSL) framework for fine-tuning a cohort of small student models generated from a large PLM using knowledge distillation. Our key insight is to share complementary knowledge among distilled student cohorts to promote their SSL effectiveness. DisCo employs a novel co-training technique to optimize multiple small student models by promoting knowledge sharing among students under diversified views: model views produced by different distillation strategies and data views produced by various input augmentations. We evaluate DisCo on both semi-supervised text classification and extractive summarization tasks. Experimental results show that DisCo can produce student models that are 7.6 times smaller and 4.8 times faster in inference than the baseline PLMs while maintaining comparable performance. We also show that DisCo-generated student models outperform the similar-sized models elaborately tuned in distinct tasks.

READ FULL TEXT

page 8

page 14

research
05/27/2023

One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification

The application of speech self-supervised learning (SSL) models has achi...
research
02/07/2021

CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of Pre-trained Language Models

Fine-tuning pre-trained language models (PLMs) has demonstrated its effe...
research
04/11/2019

Knowledge Flow: Improve Upon Your Teachers

A zoo of deep nets is available these days for almost any given task, an...
research
10/24/2020

Pre-trained Summarization Distillation

Recent state-of-the-art approaches to summarization utilize large pre-tr...
research
07/16/2023

MinT: Boosting Generalization in Mathematical Reasoning via Multi-View Fine-Tuning

Reasoning in mathematical domains remains a significant challenge for re...
research
06/05/2019

Variational Pretraining for Semi-supervised Text Classification

We introduce VAMPIRE, a lightweight pretraining framework for effective ...
research
05/22/2023

Distilling ChatGPT for Explainable Automated Student Answer Assessment

Assessing student answers and providing valuable feedback is crucial for...

Please sign up or login with your details

Forgot password? Click here to reset