Enhancement Encoding: A New Imbalanced Classification Approach via Encoding the Labels

08/23/2022
by   Jia-Chen Zhao, et al.
0

Class imbalance, which is also called long-tailed distribution, is a common problem in classification tasks based on machine learning. If it happens, the minority data will be overwhelmed by the majority, which presents quite a challenge for data science. To address the class imbalance problem, researchers have proposed lots of methods: some people make the data set balanced (SMOTE), some others refine the loss function (Focal Loss), and even someone has noticed the value of labels influences class-imbalanced learning (Yang and Xu. Rethinking the value of labels for improving class-imbalanced learning. In NeurIPS 2020), but no one changes the way to encode the labels of data yet. Nowadays, the most prevailing technique to encode labels is the one-hot encoding due to its nice performance in the general situation. However, it is not a good choice for imbalanced data, because the classifier will treat majority and minority samples equally. In this paper, we innovatively propose the enhancement encoding technique, which is specially designed for the imbalanced classification. The enhancement encoding combines re-weighting and cost-sensitiveness, which can reflect the difference between hard and easy (or minority and majority) classes. In order to reduce the number of validation samples and the computation cost, we also replace the confusion matrix with the novel soft-confusion matrix which works better with a small validation set. In the experiments, we evaluate the enhancement encoding with three different types of loss. And the results show that enhancement encoding is very effective to improve the performance of the network trained with imbalanced data. Particularly, the performance on minority classes is much better.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2020

Class-Wise Difficulty-Balanced Loss for Solving Class-Imbalance

Class-imbalance is one of the major challenges in real world datasets, w...
research
05/26/2023

Score-balanced Loss for Multi-aspect Pronunciation Assessment

With rapid technological growth, automatic pronunciation assessment has ...
research
11/25/2019

A Self-Adaptive Synthetic Over-Sampling Technique for Imbalanced Classification

Traditionally, in supervised machine learning, (a significant) part of t...
research
07/09/2023

ECL: Class-Enhancement Contrastive Learning for Long-tailed Skin Lesion Classification

Skin image datasets often suffer from imbalanced data distribution, exac...
research
06/23/2020

Classification Performance Metric for Imbalance Data Based on Recall and Selectivity Normalized in Class Labels

In the classification of a class imbalance dataset, the performance meas...
research
05/23/2022

Throwing Away Data Improves Worst-Class Error in Imbalanced Classification

Class imbalances pervade classification problems, yet their treatment di...
research
02/14/2018

Tackling Multilabel Imbalance through Label Decoupling and Data Resampling Hybridization

The learning from imbalanced data is a deeply studied problem in standar...

Please sign up or login with your details

Forgot password? Click here to reset