A Method for Handling Multi-class Imbalanced Data by Geometry based Information Sampling and Class Prioritized Synthetic Data Generation (GICaPS)

10/11/2020
by   Anima Majumder, et al.
0

This paper looks into the problem of handling imbalanced data in a multi-label classification problem. The problem is solved by proposing two novel methods that primarily exploit the geometric relationship between the feature vectors. The first one is an undersampling algorithm that uses angle between feature vectors to select more informative samples while rejecting the less informative ones. A suitable criterion is proposed to define the informativeness of a given sample. The second one is an oversampling algorithm that uses a generative algorithm to create new synthetic data that respects all class boundaries. This is achieved by finding no man's land based on Euclidean distance between the feature vectors. The efficacy of the proposed methods is analyzed by solving a generic multi-class recognition problem based on mixture of Gaussians. The superiority of the proposed algorithms is established through comparison with other state-of-the-art methods, including SMOTE and ADASYN, over ten different publicly available datasets exhibiting high-to-extreme data imbalance. These two methods are combined into a single data processing framework and is labeled as “GICaPS” to highlight the role of geometry-based information (GI) sampling and Class-Prioritized Synthesis (CaPS) in dealing with multi-class data imbalance problem, thereby making a novel contribution in this field.

READ FULL TEXT
research
04/07/2020

Combined Cleaning and Resampling Algorithm for Multi-Class Imbalanced Data with Label Noise

The imbalanced data classification is one of the most crucial tasks faci...
research
05/07/2020

Multi-Label Sampling based on Local Label Imbalance

Class imbalance is an inherent characteristic of multi-label data that h...
research
09/28/2022

Class-Imbalanced Complementary-Label Learning via Weighted Loss

Complementary-label learning (CLL) is a common application in the scenar...
research
11/22/2018

ICPRAI 2018 SI: On dynamic ensemble selection and data preprocessing for multi-class imbalance learning

Class-imbalance refers to classification problems in which many more ins...
research
06/21/2022

BiometricBlender: Ultra-high dimensional, multi-class synthetic data generator to imitate biometric feature space

The lack of freely available (real-life or synthetic) high or ultra-high...
research
07/07/2018

Synthetic Sampling for Multi-Class Malignancy Prediction

We explore several oversampling techniques for an imbalanced multi-label...
research
02/02/2020

Towards Deep Machine Reasoning: a Prototype-based Deep Neural Network with Decision Tree Inference

In this paper we introduce the DMR – a prototype-based method and networ...

Please sign up or login with your details

Forgot password? Click here to reset