On Data Augmentation for Extreme Multi-label Classification

09/22/2020
by   Danqing Zhang, et al.
16

In this paper, we focus on data augmentation for the extreme multi-label classification (XMC) problem. One of the most challenging issues of XMC is the long tail label distribution where even strong models suffer from insufficient supervision. To mitigate such label bias, we propose a simple and effective augmentation framework and a new state-of-the-art classifier. Our augmentation framework takes advantage of the pre-trained GPT-2 model to generate label-invariant perturbations of the input texts to augment the existing training data. As a result, it present substantial improvements over baseline models. Our contributions are two-factored: (1) we introduce a new state-of-the-art classifier that uses label attention with RoBERTa and combine it with our augmentation framework for further improvement; (2) we present a broad study on how effective are different augmentation methods in the XMC task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2021

Fine-Grained AutoAugmentation for Multi-Label Classification

Data augmentation is a commonly used approach to improving the generaliz...
research
11/02/2022

Generative Poisoning Using Random Discriminators

We introduce ShortcutGen, a new data poisoning attack that generates sam...
research
05/30/2023

Cross Encoding as Augmentation: Towards Effective Educational Text Classification

Text classification in education, usually called auto-tagging, is the au...
research
08/17/2023

Bag of Tricks for Long-Tailed Multi-Label Classification on Chest X-Rays

Clinical classification of chest radiography is particularly challenging...
research
04/11/2022

HFL at SemEval-2022 Task 8: A Linguistics-inspired Regression Model with Data Augmentation for Multilingual News Similarity

This paper describes our system designed for SemEval-2022 Task 8: Multil...
research
10/05/2021

Deep Subspace analysing for Semi-Supervised multi-label classification of Diabetic Foot Ulcer

Diabetes is a global raising pandemic. Diabetes patients are at risk of ...
research
10/14/2021

Nuisance-Label Supervision: Robustness Improvement by Free Labels

In this paper, we present a Nuisance-label Supervision (NLS) module, whi...

Please sign up or login with your details

Forgot password? Click here to reset