On Data Augmentation for Extreme Multi-label Classification

by   Danqing Zhang, et al.

In this paper, we focus on data augmentation for the extreme multi-label classification (XMC) problem. One of the most challenging issues of XMC is the long tail label distribution where even strong models suffer from insufficient supervision. To mitigate such label bias, we propose a simple and effective augmentation framework and a new state-of-the-art classifier. Our augmentation framework takes advantage of the pre-trained GPT-2 model to generate label-invariant perturbations of the input texts to augment the existing training data. As a result, it present substantial improvements over baseline models. Our contributions are two-factored: (1) we introduce a new state-of-the-art classifier that uses label attention with RoBERTa and combine it with our augmentation framework for further improvement; (2) we present a broad study on how effective are different augmentation methods in the XMC task.



There are no comments yet.


page 1

page 2

page 3

page 4


Fine-Grained AutoAugmentation for Multi-Label Classification

Data augmentation is a commonly used approach to improving the generaliz...

HFL at SemEval-2022 Task 8: A Linguistics-inspired Regression Model with Data Augmentation for Multilingual News Similarity

This paper describes our system designed for SemEval-2022 Task 8: Multil...

Deep Subspace analysing for Semi-Supervised multi-label classification of Diabetic Foot Ulcer

Diabetes is a global raising pandemic. Diabetes patients are at risk of ...

A Multi-cascaded Model with Data Augmentation for Enhanced Paraphrase Detection in Short Texts

Paraphrase detection is an important task in text analytics with numerou...

Nuisance-Label Supervision: Robustness Improvement by Free Labels

In this paper, we present a Nuisance-label Supervision (NLS) module, whi...

Automated Data Augmentations for Graph Classification

Data augmentations are effective in improving the invariance of learning...

Reprint: a randomized extrapolation based on principal components for data augmentation

Data scarcity and data imbalance have attracted a lot of attention in ma...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.