DeepAI AI Chat
Log In Sign Up

Semi-Supervised Text Classification via Self-Pretraining

by   Payam Karisani, et al.

We present a neural semi-supervised learning model termed Self-Pretraining. Our model is inspired by the classic self-training algorithm. However, as opposed to self-training, Self-Pretraining is threshold-free, it can potentially update its belief about previously labeled documents, and can cope with the semantic drift problem. Self-Pretraining is iterative and consists of two classifiers. In each iteration, one classifier draws a random set of unlabeled documents and labels them. This set is used to initialize the second classifier, to be further trained by the set of labeled documents. The algorithm proceeds to the next iteration and the classifiers' roles are reversed. To improve the flow of information across the iterations and also to cope with the semantic drift problem, Self-Pretraining employs an iterative distillation process, transfers hypotheses across the iterations, utilizes a two-stage training model, uses an efficient learning rate schedule, and employs a pseudo-label transformation heuristic. We have evaluated our model in three publicly available social media datasets. Our experiments show that Self-Pretraining outperforms the existing state-of-the-art semi-supervised classifiers across multiple settings. Our code is available at


page 1

page 2

page 3

page 4


Neural Semi-supervised Learning for Text Classification Under Large-Scale Pretraining

The goal of semi-supervised learning is to utilize the unlabeled, in-dom...

Learning with Partial Labels from Semi-supervised Perspective

Partial Label (PL) learning refers to the task of learning from the part...

Large-Scale Self- and Semi-Supervised Learning for Speech Translation

In this paper, we improve speech translation (ST) through effectively le...

Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training

Self-training is a standard approach to semi-supervised learning where t...

SAT: Improving Semi-Supervised Text Classification with Simple Instance-Adaptive Self-Training

Self-training methods have been explored in recent years and have exhibi...

Revisiting Pretraining for Semi-Supervised Learning in the Low-Label Regime

Semi-supervised learning (SSL) addresses the lack of labeled data by exp...

Evaluating Protein Transfer Learning with TAPE

Protein modeling is an increasingly popular area of machine learning res...