DeepAI AI Chat
Log In Sign Up

Text Classification Using Label Names Only: A Language Model Self-Training Approach

10/14/2020
by   Yu Meng, et al.
Microsoft
University of Illinois at Urbana-Champaign
Georgia Institute of Technology
0

Current text classification methods typically require a good number of human-labeled documents as training data, which can be costly and difficult to obtain in real applications. Humans can perform classification without seeing any labeled examples but only based on a small set of words describing the categories to be classified. In this paper, we explore the potential of only using the label name of each class to train classification models on unlabeled data, without using any labeled documents. We use pre-trained neural language models both as general linguistic knowledge sources for category understanding and as representation learning models for document classification. Our method (1) associates semantically related words with the label names, (2) finds category-indicative words and trains the model to predict their implied categories, and (3) generalizes the model via self-training. We show that our model achieves around 90 and sentiment classification without using any labeled documents but learning from unlabeled data supervised by at most 3 words (1 in most cases) per class as the label name.

READ FULL TEXT

page 1

page 2

page 3

page 4

02/05/2022

Improving Probabilistic Models in Text Classification via Active Learning

When using text data, social scientists often classify documents in orde...
11/07/2021

MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information

We study the problem of weakly supervised text classification, which aim...
01/23/2012

A probabilistic methodology for multilabel classification

Multilabel classification is a relatively recent subfield of machine lea...
11/24/2021

Out-of-Category Document Identification Using Target-Category Names as Weak Supervision

Identifying outlier documents, whose content is different from the major...
10/21/2022

Robustifying Sentiment Classification by Maximally Exploiting Few Counterfactuals

For text classification tasks, finetuned language models perform remarka...
06/03/2021

Exploring Distantly-Labeled Rationales in Neural Network Models

Recent studies strive to incorporate various human rationales into neura...
04/05/2018

Few-Shot Text Classification with Pre-Trained Word Embeddings and a Human in the Loop

Most of the literature around text classification treats it as a supervi...