BERT-Flow-VAE: A Weakly-supervised Model for Multi-Label Text Classification

10/27/2022
by   Ziwen Liu, et al.
0

Multi-label Text Classification (MLTC) is the task of categorizing documents into one or more topics. Considering the large volumes of data and varying domains of such tasks, fully supervised learning requires manually fully annotated datasets which is costly and time-consuming. In this paper, we propose BERT-Flow-VAE (BFV), a Weakly-Supervised Multi-Label Text Classification (WSMLTC) model that reduces the need for full supervision. This new model (1) produces BERT sentence embeddings and calibrates them using a flow model, (2) generates an initial topic-document matrix by averaging results of a seeded sparse topic model and a textual entailment model which only require surface name of topics and 4-6 seed words per topic, and (3) adopts a VAE framework to reconstruct the embeddings under the guidance of the topic-document matrix. Finally, (4) it uses the means produced by the encoder model in the VAE architecture as predictions for MLTC. Experimental results on 6 multi-label datasets show that BFV can substantially outperform other baseline WSMLTC models in key metrics and achieve approximately 84 of a fully-supervised model.

READ FULL TEXT

page 9

page 17

page 18

research
12/04/2017

Topics and Label Propagation: Best of Both Worlds for Weakly Supervised Text Classification

We propose a Label Propagation based algorithm for weakly supervised tex...
research
11/05/2017

Multi-label Dataless Text Classification with Topic Modeling

Manually labeling documents is tedious and expensive, but it is essentia...
research
04/04/2023

MEGClass: Text Classification with Extremely Weak Supervision via Mutually-Enhancing Text Granularities

Text classification typically requires a substantial amount of human-ann...
research
11/20/2021

Weakly Supervised Prototype Topic Model with Discriminative Seed Words: Modifying the Category Prior by Self-exploring Supervised Signals

Dataless text classification, i.e., a new paradigm of weakly supervised ...
research
04/08/2022

Bag-of-Words vs. Sequence vs. Graph vs. Hierarchy for Single- and Multi-Label Text Classification

Graph neural networks have triggered a resurgence of graph-based text cl...
research
02/28/2022

Semi-supervised Nonnegative Matrix Factorization for Document Classification

We propose new semi-supervised nonnegative matrix factorization (SSNMF) ...
research
05/22/2023

A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches

Etremely Weakly Supervised Text Classification (XWS-TC) refers to text c...

Please sign up or login with your details

Forgot password? Click here to reset