Dealing with Difficult Minority Labels in Imbalanced Mutilabel Data Sets

02/14/2018
by   Francisco Charte, et al.
0

Multilabel classification is an emergent data mining task with a broad range of real world applications. Learning from imbalanced multilabel data is being deeply studied latterly, and several resampling methods have been proposed in the literature. The unequal label distribution in most multilabel datasets, with disparate imbalance levels, could be a handicap while learning new classifiers. In addition, this characteristic challenges many of the existent preprocessing algorithms. Furthermore, the concurrence between imbalanced labels can make harder the learning from certain labels. These are what we call difficult labels. In this work, the problem of difficult labels is deeply analyzed, its influence in multilabel classifiers is studied, and a novel way to solve this problem is proposed. Specific metrics to assess this trait in multilabel datasets, called SCUMBLE (Score of ConcUrrence among iMBalanced LabEls) and SCUMBLELbl, are presented along with REMEDIAL (REsampling MultilabEl datasets by Decoupling highly ImbAlanced Labels), a new algorithm aimed to relax label concurrence. How to deal with this problem using the R mldr package is also outlined.

READ FULL TEXT

page 4

page 5

page 16

page 17

research
02/14/2018

Tackling Multilabel Imbalance through Label Decoupling and Data Resampling Hybridization

The learning from imbalanced data is a deeply studied problem in standar...
research
09/29/2020

Weakly Supervised-Based Oversampling for High Imbalance and High Dimensionality Data Classification

With the abundance of industrial datasets, imbalanced classification has...
research
03/23/2023

SC-MIL: Supervised Contrastive Multiple Instance Learning for Imbalanced Classification in Pathology

Multiple Instance learning (MIL) models have been extensively used in pa...
research
05/26/2023

mldr.resampling: Efficient Reference Implementations of Multilabel Resampling Algorithms

Resampling algorithms are a useful approach to deal with imbalanced lear...
research
06/29/2020

Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization

Real-world large-scale datasets are heteroskedastic and imbalanced – lab...
research
09/16/2022

ImDrug: A Benchmark for Deep Imbalanced Learning in AI-aided Drug Discovery

The last decade has witnessed a prosperous development of computational ...
research
03/06/2023

Benchmark of Data Preprocessing Methods for Imbalanced Classification

Severe class imbalance is one of the main conditions that make machine l...

Please sign up or login with your details

Forgot password? Click here to reset