Long-tailed Extreme Multi-label Text Classification with Generated Pseudo Label Descriptions

04/02/2022
by   Ruohong Zhang, et al.
0

Extreme Multi-label Text Classification (XMTC) has been a tough challenge in machine learning research and applications due to the sheer sizes of the label spaces and the severe data scarce problem associated with the long tail of rare labels in highly skewed distributions. This paper addresses the challenge of tail label prediction by proposing a novel approach, which combines the effectiveness of a trained bag-of-words (BoW) classifier in generating informative label descriptions under severe data scarce conditions, and the power of neural embedding based retrieval models in mapping input documents (as queries) to relevant label descriptions. The proposed approach achieves state-of-the-art performance on XMTC benchmark datasets and significantly outperforms the best methods so far in the tail label prediction. We also provide a theoretical analysis for relating the BoW and neural models w.r.t. performance lower bound.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/24/2021

Does Head Label Help for Long-Tailed Multi-Label Text Classification

Multi-label text classification (MLTC) aims to annotate documents with t...
research
12/10/2020

GNN-XML: Graph Neural Networks for Extreme Multi-label Text Classification

Extreme multi-label text classification (XMTC) aims to tag a text instan...
research
11/19/2022

Pairwise Instance Relation Augmentation for Long-tailed Multi-label Text Classification

Multi-label text classification (MLTC) is one of the key tasks in natura...
research
05/22/2023

Retrieval-augmented Multi-label Text Classification

Multi-label text classification (MLC) is a challenging task in settings ...
research
06/15/2019

Towards Integration of Statistical Hypothesis Tests into Deep Neural Networks

We report our ongoing work about a new deep architecture working in tand...
research
05/18/2020

Interaction Matching for Long-Tail Multi-Label Classification

We present an elegant and effective approach for addressing limitations ...
research
12/08/2020

Unsupervised Label Refinement Improves Dataless Text Classification

Dataless text classification is capable of classifying documents into pr...

Please sign up or login with your details

Forgot password? Click here to reset