Variational Pretraining for Semi-supervised Text Classification

06/05/2019
by   Suchin Gururangan, et al.
0

We introduce VAMPIRE, a lightweight pretraining framework for effective text classification when data and computing resources are limited. We pretrain a unigram document model as a variational autoencoder on in-domain, unlabeled data and use its internal states as features in a downstream classifier. Empirically, we show the relative strength of VAMPIRE against computationally expensive contextual embeddings and other popular semi-supervised baselines under low resource settings. We also find that fine-tuning to in-domain data is crucial to achieving decent performance from contextual embeddings when working with limited supervision. We accompany this paper with code to pretrain and use VAMPIRE embeddings in downstream tasks.

READ FULL TEXT
research
11/17/2020

Neural Semi-supervised Learning for Text Classification Under Large-Scale Pretraining

The goal of semi-supervised learning is to utilize the unlabeled, in-dom...
research
07/04/2023

KDSTM: Neural Semi-supervised Topic Modeling with Knowledge Distillation

In text classification tasks, fine tuning pretrained language models lik...
research
03/08/2016

Variational Autoencoders for Semi-supervised Text Classification

Although semi-supervised variational autoencoder (SemiVAE) works in imag...
research
01/26/2021

Combining Deep Generative Models and Multi-lingual Pretraining for Semi-supervised Document Classification

Semi-supervised learning through deep generative models and multi-lingua...
research
05/20/2023

DisCo: Distilled Student Models Co-training for Semi-supervised Text Mining

Many text mining models are constructed by fine-tuning a large deep pre-...
research
10/23/2022

SAT: Improving Semi-Supervised Text Classification with Simple Instance-Adaptive Self-Training

Self-training methods have been explored in recent years and have exhibi...
research
02/07/2016

Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings

One-hot CNN (convolutional neural network) has been shown to be effectiv...

Please sign up or login with your details

Forgot password? Click here to reset