KDSTM: Neural Semi-supervised Topic Modeling with Knowledge Distillation

07/04/2023
by   Weijie Xu, et al.
0

In text classification tasks, fine tuning pretrained language models like BERT and GPT-3 yields competitive accuracy; however, both methods require pretraining on large text datasets. In contrast, general topic modeling methods possess the advantage of analyzing documents to extract meaningful patterns of words without the need of pretraining. To leverage topic modeling's unsupervised insights extraction on text classification tasks, we develop the Knowledge Distillation Semi-supervised Topic Modeling (KDSTM). KDSTM requires no pretrained embeddings, few labeled documents and is efficient to train, making it ideal under resource constrained settings. Across a variety of datasets, our method outperforms existing supervised topic modeling methods in classification accuracy, robustness and efficiency and achieves similar performance compare to state of the art weakly supervised text classification methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2022

Performance-Efficiency Trade-Offs in Adapting Language Models to Text Classification Tasks

Pre-trained language models (LMs) obtain state-of-the-art performance wh...
research
09/08/2020

Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function

In this paper, we study bidirectional LSTM network for the task of text ...
research
09/22/2021

Tecnologica cosa: Modeling Storyteller Personalities in Boccaccio's Decameron

We explore Boccaccio's Decameron to see how digital humanities tools can...
research
07/03/2023

vONTSS: vMF based semi-supervised neural topic modeling with optimal transport

Recently, Neural Topic Models (NTM), inspired by variational autoencoder...
research
06/05/2019

Variational Pretraining for Semi-supervised Text Classification

We introduce VAMPIRE, a lightweight pretraining framework for effective ...
research
07/06/2023

S2vNTM: Semi-supervised vMF Neural Topic Modeling

Language model based methods are powerful techniques for text classifica...
research
09/30/2021

Semi-Supervised Text Classification via Self-Pretraining

We present a neural semi-supervised learning model termed Self-Pretraini...

Please sign up or login with your details

Forgot password? Click here to reset