Synthetic Oversampling of Multi-Label Data based on Local Label Distribution

05/02/2019
by   Bin Liu, et al.
0

Class-imbalance is an inherent characteristic of multi-label data which affects the prediction accuracy of most multi-label learning methods. One efficient strategy to deal with this problem is to employ resampling techniques before training the classifier. Existing multilabel sampling methods alleviate the (global) imbalance of multi-label datasets. However, performance degradation is mainly due to rare subconcepts and overlapping of classes that could be analysed by looking at the local characteristics of the minority examples, rather than the imbalance of the whole dataset. We propose a new method for synthetic oversampling of multi-label data that focuses on local label distribution to generate more diverse and better labeled instances. Experimental results on 13 multi-label datasets demonstrate the effectiveness of the proposed approach in a variety of evaluation measures, particularly in the case of an ensemble of classifiers trained on repeated samples of the original data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2020

Multi-Label Sampling based on Local Label Imbalance

Class imbalance is an inherent characteristic of multi-label data that h...
research
05/08/2020

Multi-Instance Multi-Label Learning for Gene Mutation Prediction in Hepatocellular Carcinoma

Gene mutation prediction in hepatocellular carcinoma (HCC) is of great d...
research
07/07/2018

Synthetic Sampling for Multi-Class Malignancy Prediction

We explore several oversampling techniques for an imbalanced multi-label...
research
07/30/2018

Making Classifier Chains Resilient to Class Imbalance

Class imbalance is an intrinsic characteristic of multi-label data. Most...
research
12/17/2020

Characterizing the Evasion Attackability of Multi-label Classifiers

Evasion attack in multi-label learning systems is an interesting, widely...
research
07/04/2019

An External Knowledge Enhanced Multi-label Charge Prediction Approach with Label Number Learning

Multi-label charge prediction is a task to predict the corresponding acc...
research
02/10/2018

Tips, guidelines and tools for managing multi-label datasets: the mldr.datasets R package and the Cometa data repository

New proposals in the field of multi-label learning algorithms have been ...

Please sign up or login with your details

Forgot password? Click here to reset