DeepAI AI Chat
Log In Sign Up

Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach

by   Wenpeng Yin, et al.
University of Pennsylvania

Zero-shot text classification (0Shot-TC) is a challenging NLU problem to which little attention has been paid by the research community. 0Shot-TC aims to associate an appropriate label with a piece of text, irrespective of the text domain and the aspect (e.g., topic, emotion, event, etc.) described by the label. And there are only a few articles studying 0Shot-TC, all focusing only on topical categorization which, we argue, is just the tip of the iceberg in 0Shot-TC. In addition, the chaotic experiments in literature make no uniform comparison, which blurs the progress. This work benchmarks the 0Shot-TC problem by providing unified datasets, standardized evaluations, and state-of-the-art baselines. Our contributions include: i) The datasets we provide facilitate studying 0Shot-TC relative to conceptually different and diverse aspects: the "topic" aspect includes "sports" and "politics" as labels; the "emotion" aspect includes "joy" and "anger"; the "situation" aspect includes "medical assistance" and "water shortage". ii) We extend the existing evaluation setup (label-partially-unseen) -- given a dataset, train on some labels, test on all labels -- to include a more challenging yet realistic evaluation label-fully-unseen 0Shot-TC (Chang et al., 2008), aiming at classifying text snippets without seeing task specific training data at all. iii) We unify the 0Shot-TC of diverse aspects within a textual entailment formulation and study it this way. Code & Data:


page 1

page 2

page 3

page 4


Label Agnostic Pre-training for Zero-shot Text Classification

Conventional approaches to text classification typically assume the exis...

Evaluating Unsupervised Text Classification: Zero-shot and Similarity-based Approaches

Text classification of unseen classes is a challenging Natural Language ...

Few-Shot Learning with Siamese Networks and Label Tuning

We study the problem of building text classifiers with little or no trai...

Zero-shot Entailment of Leaderboards for Empirical AI Research

We present a large-scale empirical investigation of the zero-shot learni...

Improving Pretrained Models for Zero-shot Multi-label Text Classification through Reinforced Label Hierarchy Reasoning

Exploiting label hierarchies has become a promising approach to tackling...

Near-Zero-Shot Suggestion Mining with a Little Help from WordNet

In this work, we explore the constructive side of online reviews: advice...