DeepAI AI Chat
Log In Sign Up

Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach

08/31/2019
by   Wenpeng Yin, et al.
University of Pennsylvania
0

Zero-shot text classification (0Shot-TC) is a challenging NLU problem to which little attention has been paid by the research community. 0Shot-TC aims to associate an appropriate label with a piece of text, irrespective of the text domain and the aspect (e.g., topic, emotion, event, etc.) described by the label. And there are only a few articles studying 0Shot-TC, all focusing only on topical categorization which, we argue, is just the tip of the iceberg in 0Shot-TC. In addition, the chaotic experiments in literature make no uniform comparison, which blurs the progress. This work benchmarks the 0Shot-TC problem by providing unified datasets, standardized evaluations, and state-of-the-art baselines. Our contributions include: i) The datasets we provide facilitate studying 0Shot-TC relative to conceptually different and diverse aspects: the "topic" aspect includes "sports" and "politics" as labels; the "emotion" aspect includes "joy" and "anger"; the "situation" aspect includes "medical assistance" and "water shortage". ii) We extend the existing evaluation setup (label-partially-unseen) -- given a dataset, train on some labels, test on all labels -- to include a more challenging yet realistic evaluation label-fully-unseen 0Shot-TC (Chang et al., 2008), aiming at classifying text snippets without seeing task specific training data at all. iii) We unify the 0Shot-TC of diverse aspects within a textual entailment formulation and study it this way. Code & Data: https://github.com/yinwenpeng/BenchmarkingZeroShot

READ FULL TEXT

page 1

page 2

page 3

page 4

05/25/2023

Label Agnostic Pre-training for Zero-shot Text Classification

Conventional approaches to text classification typically assume the exis...
11/29/2022

Evaluating Unsupervised Text Classification: Zero-shot and Similarity-based Approaches

Text classification of unseen classes is a challenging Natural Language ...
03/28/2022

Few-Shot Learning with Siamese Networks and Label Tuning

We study the problem of building text classifiers with little or no trai...
03/29/2023

Zero-shot Entailment of Leaderboards for Empirical AI Research

We present a large-scale empirical investigation of the zero-shot learni...
04/04/2021

Improving Pretrained Models for Zero-shot Multi-label Text Classification through Reinforced Label Hierarchy Reasoning

Exploiting label hierarchies has become a promising approach to tackling...
11/25/2021

Near-Zero-Shot Suggestion Mining with a Little Help from WordNet

In this work, we explore the constructive side of online reviews: advice...