A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches

05/22/2023
by   Zihan Wang, et al.
0

Etremely Weakly Supervised Text Classification (XWS-TC) refers to text classification based on minimal high-level human guidance, such as a few label-indicative seed words or classification instructions. There are two mainstream approaches for XWS-TC, however, never being rigorously compared: (1) training classifiers based on pseudo-labels generated by (softly) matching seed words (SEED) and (2) prompting (and calibrating) language models using classification instruction (and raw texts) to decode label words (PROMPT). This paper presents the first XWS-TC benchmark to compare the two approaches on fair grounds, where the datasets, supervisions, and hyperparameter choices are standardized across methods. Our benchmarking results suggest that (1) Both SEED and PROMPT approaches are competitive and there is no clear winner; (2) SEED is empirically more tolerant than PROMPT to human guidance (e.g., seed words, classification instructions, and label words) changes; (3) SEED is empirically more selective than PROMPT to the pre-trained language models; (4) Recent SEED and PROMPT methods have close connections and a clustering post-processing step based on raw in-domain texts is a strong performance booster to both. We hope this benchmark serves as a guideline in selecting XWS-TC methods in different scenarios and stimulate interest in developing guidance- and model-robust XWS-TC methods. We release the repo at https://github.com/ZihanWangKi/x-TC.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2022

LIME: Weakly-Supervised Text Classification Without Seeds

In weakly-supervised text classification, only label names act as source...
research
05/24/2023

Debiasing Made State-of-the-art: Revisiting the Simple Seed-based Weak Supervision for Text Classification

Recent advances in weakly supervised text classification mostly focus on...
research
04/30/2020

User-Guided Aspect Classification for Domain-Specific Texts

Aspect classification, identifying aspects of text segments, facilitates...
research
04/20/2021

Seed Word Selection for Weakly-Supervised Text Classification with Unsupervised Error Estimation

Weakly-supervised text classification aims to induce text classifiers fr...
research
05/24/2022

WeDef: Weakly Supervised Backdoor Defense for Text Classification

Existing backdoor defense methods are only effective for limited trigger...
research
10/27/2022

BERT-Flow-VAE: A Weakly-supervised Model for Multi-Label Text Classification

Multi-label Text Classification (MLTC) is the task of categorizing docum...
research
10/24/2020

X-Class: Text Classification with Extremely Weak Supervision

In this paper, we explore to conduct text classification with extremely ...

Please sign up or login with your details

Forgot password? Click here to reset