Weakly-supervised Text Classification Based on Keyword Graph

10/06/2021
by   Lu Zhang, et al.
0

Weakly-supervised text classification has received much attention in recent years for it can alleviate the heavy burden of annotating massive data. Among them, keyword-driven methods are the mainstream where user-provided keywords are exploited to generate pseudo-labels for unlabeled texts. However, existing methods treat keywords independently, thus ignore the correlation among them, which should be useful if properly exploited. In this paper, we propose a novel framework called ClassKG to explore keyword-keyword correlation on keyword graph by GNN. Our framework is an iterative process. In each iteration, we first construct a keyword graph, so the task of assigning pseudo labels is transformed to annotating keyword subgraphs. To improve the annotation quality, we introduce a self-supervised task to pretrain a subgraph annotator, and then finetune it. With the pseudo labels generated by the subgraph annotator, we then train a text classifier to classify the unlabeled texts. Finally, we re-extract keywords from the classified texts. Extensive experiments on both long-text and short-text datasets show that our method substantially outperforms the existing ones

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/11/2022

FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification

Weakly-supervised text classification aims to train a classifier using o...
research
05/25/2022

LOPS: Learning Order Inspired Pseudo-Label Selection for Weakly Supervised Text Classification

Weakly supervised text classification methods typically train a deep neu...
research
11/23/2022

Embedding Compression for Text Classification Using Dictionary Screening

In this paper, we propose a dictionary screening method for embedding co...
research
10/16/2019

HiGitClass: Keyword-Driven Hierarchical Classification of GitHub Repositories

GitHub has become an important platform for code sharing and scientific ...
research
10/10/2019

Learning Only from Relevant Keywords and Unlabeled Documents

We consider a document classification problem where document labels are ...
research
04/19/2023

Controlling keywords and their positions in text generation

One of the challenges in text generation is to control generation as int...
research
04/26/2020

CrowdTSC: Crowd-based Neural Networks for Text Sentiment Classification

Sentiment classification is a fundamental task in content analysis. Alth...

Please sign up or login with your details

Forgot password? Click here to reset