Weakly-Supervised Neural Text Classification

09/02/2018
by   Yu Meng, et al.
0

Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural text classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural text classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for text classification. We have performed extensive experiments on three real-world datasets from different domains. The results demonstrate that our proposed method achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/29/2018

Weakly-Supervised Hierarchical Text Classification

Hierarchical text classification, which aims to classify text documents ...
research
05/25/2022

LOPS: Learning Order Inspired Pseudo-Label Selection for Weakly Supervised Text Classification

Weakly supervised text classification methods typically train a deep neu...
research
11/07/2021

MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information

We study the problem of weakly supervised text classification, which aim...
research
05/21/2023

WOT-Class: Weakly Supervised Open-world Text Classification

State-of-the-art weakly supervised text classification methods, while si...
research
06/10/2021

DUET: Detection Utilizing Enhancement for Text in Scanned or Captured Documents

We present a novel deep neural model for text detection in document imag...
research
06/05/2023

CELDA: Leveraging Black-box Language Model as Enhanced Classifier without Labels

Utilizing language models (LMs) without internal access is becoming an a...
research
01/24/2019

General Supervision via Probabilistic Transformations

Different types of training data have led to numerous schemes for superv...

Please sign up or login with your details

Forgot password? Click here to reset