Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach

08/31/2019
by   Wenpeng Yin, et al.
0

Zero-shot text classification (0Shot-TC) is a challenging NLU problem to which little attention has been paid by the research community. 0Shot-TC aims to associate an appropriate label with a piece of text, irrespective of the text domain and the aspect (e.g., topic, emotion, event, etc.) described by the label. And there are only a few articles studying 0Shot-TC, all focusing only on topical categorization which, we argue, is just the tip of the iceberg in 0Shot-TC. In addition, the chaotic experiments in literature make no uniform comparison, which blurs the progress. This work benchmarks the 0Shot-TC problem by providing unified datasets, standardized evaluations, and state-of-the-art baselines. Our contributions include: i) The datasets we provide facilitate studying 0Shot-TC relative to conceptually different and diverse aspects: the "topic" aspect includes "sports" and "politics" as labels; the "emotion" aspect includes "joy" and "anger"; the "situation" aspect includes "medical assistance" and "water shortage". ii) We extend the existing evaluation setup (label-partially-unseen) -- given a dataset, train on some labels, test on all labels -- to include a more challenging yet realistic evaluation label-fully-unseen 0Shot-TC (Chang et al., 2008), aiming at classifying text snippets without seeing task specific training data at all. iii) We unify the 0Shot-TC of diverse aspects within a textual entailment formulation and study it this way. Code & Data: https://github.com/yinwenpeng/BenchmarkingZeroShot

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2023

Label Agnostic Pre-training for Zero-shot Text Classification

Conventional approaches to text classification typically assume the exis...
research
11/29/2022

Evaluating Unsupervised Text Classification: Zero-shot and Similarity-based Approaches

Text classification of unseen classes is a challenging Natural Language ...
research
03/28/2022

Few-Shot Learning with Siamese Networks and Label Tuning

We study the problem of building text classifiers with little or no trai...
research
03/29/2023

Zero-shot Entailment of Leaderboards for Empirical AI Research

We present a large-scale empirical investigation of the zero-shot learni...
research
06/29/2023

Towards Open-Domain Topic Classification

We introduce an open-domain topic classification system that accepts use...
research
11/25/2021

Near-Zero-Shot Suggestion Mining with a Little Help from WordNet

In this work, we explore the constructive side of online reviews: advice...
research
05/04/2022

Are All the Datasets in Benchmark Necessary? A Pilot Study of Dataset Evaluation for Text Classification

In this paper, we ask the research question of whether all the datasets ...

Please sign up or login with your details

Forgot password? Click here to reset