Unsupervised Label Refinement Improves Dataless Text Classification

12/08/2020
by   Zewei Chu, et al.
0

Dataless text classification is capable of classifying documents into previously unseen labels by assigning a score to any document paired with a label description. While promising, it crucially relies on accurate descriptions of the label set for each downstream task. This reliance causes dataless classifiers to be highly sensitive to the choice of label descriptions and hinders the broader application of dataless classification in practice. In this paper, we ask the following question: how can we improve dataless text classification using the inputs of the downstream task dataset? Our primary solution is a clustering based approach. Given a dataless classifier, our approach refines its set of predictions using k-means clustering. We demonstrate the broad applicability of our approach by improving the performance of two widely used classifier architectures, one that encodes text-category pairs with two independent encoders and one with a single joint encoder. Experiments show that our approach consistently improves dataless classification across different datasets and makes the classifier more robust to the choice of label descriptions.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

06/16/2018

Joint Input-Label Embedding for Neural Text Classification

Neural text classification methods typically treat output classes as cat...
04/06/2020

Joint Embedding of Words and Category Labels for Hierarchical Multi-label Text Classification

Text classification has become increasingly challenging due to the conti...
02/08/2020

Description Based Text Classification with Reinforcement Learning

The task of text classification is usually divided into two stages: tex...
01/27/2021

Towards Robustness to Label Noise in Text Classification via Noise Modeling

Large datasets in NLP suffer from noisy labels, due to erroneous automat...
09/20/2015

Early text classification: a Naive solution

Text classification is a widely studied problem, and it can be considere...
04/20/2022

Unsupervised Ranking and Aggregation of Label Descriptions for Zero-Shot Classifiers

Zero-shot text classifiers based on label descriptions embed an input te...
10/07/2020

Multi-label classification of promotions in digital leaflets using textual and visual information

Product descriptions in e-commerce platforms contain detailed and valuab...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.