Balancing Methods for Multi-label Text Classification with Long-Tailed Class Distribution

09/10/2021
by   Yi Huang, et al.
0

Multi-label text classification is a challenging task because it requires capturing label dependencies. It becomes even more challenging when class distribution is long-tailed. Resampling and re-weighting are common approaches used for addressing the class imbalance problem, however, they are not effective when there is label dependency besides class imbalance because they result in oversampling of common labels. Here, we introduce the application of balancing loss functions for multi-label text classification. We perform experiments on a general domain dataset with 90 labels (Reuters-21578) and a domain-specific dataset from PubMed with 18211 labels. We find that a distribution-balanced loss function, which inherently addresses both the class imbalance and label linkage problems, outperforms commonly used loss functions. Distribution balancing methods have been successfully used in the image recognition field. Here, we show their effectiveness in natural language processing. Source code is available at https://github.com/blessu/BalancedLossNLP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2020

Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets

We present a new loss function called Distribution-Balanced Loss for the...
research
11/28/2022

Class Adaptive Network Calibration

Recent studies have revealed that, beyond conventional accuracy, calibra...
research
05/08/2023

LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition

Long-tailed multi-label visual recognition (LTML) task is a highly chall...
research
06/24/2021

Modeling Diagnostic Label Correlation for Automatic ICD Coding

Given the clinical notes written in electronic health records (EHRs), it...
research
10/29/2021

Generalized Data Weighting via Class-level Gradient Manipulation

Label noise and class imbalance are two major issues coexisting in real-...
research
11/19/2022

Pairwise Instance Relation Augmentation for Long-tailed Multi-label Text Classification

Multi-label text classification (MLTC) is one of the key tasks in natura...
research
10/21/2021

Multi-label Classification with Partial Annotations using Class-aware Selective Loss

Large-scale multi-label classification datasets are commonly, and perhap...

Please sign up or login with your details

Forgot password? Click here to reset