MEGClass: Text Classification with Extremely Weak Supervision via Mutually-Enhancing Text Granularities

04/04/2023
by   Priyanka Kargupta, et al.
0

Text classification typically requires a substantial amount of human-annotated data to serve as supervision, which is costly to obtain in dynamic emerging domains. Certain methods seek to address this problem by solely relying on the surface text of class names to serve as extremely weak supervision. However, existing methods fail to account for single-class documents discussing multiple topics. Both topic diversity and vague sentences may introduce noise into the document's underlying representation and consequently the precision of the predicted class. Furthermore, current work focuses on text granularities (documents, sentences, or words) independently, which limits the degree of coarse- or fine-grained context that we can jointly extract from all three to identify significant subtext for classification. In order to address this problem, we propose MEGClass, an extremely weakly-supervised text classification method to exploit Mutually-Enhancing Text Granularities. Specifically, MEGClass constructs class-oriented sentence and class representations based on keywords for performing a sentence-level confidence-weighted label ensemble in order to estimate a document's initial class distribution. This serves as the target distribution for a multi-head attention network with a class-weighted contrastive loss. This network learns contextualized sentence representations and weights to form document representations that reflect its original document and sentence-level topic diversity. Retaining this heterogeneity allows MEGClass to select the most class-indicative documents to serve as iterative feedback for enhancing the class representations. Finally, these top documents are used to fine-tune a pre-trained text classifier. As demonstrated through extensive experiments on six benchmark datasets, MEGClass outperforms other weakly and extremely weakly supervised methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2020

X-Class: Text Classification with Extremely Weak Supervision

In this paper, we explore to conduct text classification with extremely ...
research
12/04/2017

Topics and Label Propagation: Best of Both Worlds for Weakly Supervised Text Classification

We propose a Label Propagation based algorithm for weakly supervised tex...
research
10/27/2022

BERT-Flow-VAE: A Weakly-supervised Model for Multi-Label Text Classification

Multi-label Text Classification (MLTC) is the task of categorizing docum...
research
05/14/2016

Rationale-Augmented Convolutional Neural Networks for Text Classification

We present a new Convolutional Neural Network (CNN) model for text class...
research
05/21/2023

WOT-Class: Weakly Supervised Open-world Text Classification

State-of-the-art weakly supervised text classification methods, while si...
research
05/24/2022

WeDef: Weakly Supervised Backdoor Defense for Text Classification

Existing backdoor defense methods are only effective for limited trigger...
research
09/16/2019

Short-Text Classification Using Unsupervised Keyword Expansion

Short-text classification, like all data science, struggles to achieve h...

Please sign up or login with your details

Forgot password? Click here to reset