Understanding CNN Fragility When Learning With Imbalanced Data

10/17/2022
by   Damien Dablain, et al.
0

Convolutional neural networks (CNNs) have achieved impressive results on imbalanced image data, but they still have difficulty generalizing to minority classes and their decisions are difficult to interpret. These problems are related because the method by which CNNs generalize to minority classes, which requires improvement, is wrapped in a blackbox. To demystify CNN decisions on imbalanced data, we focus on their latent features. Although CNNs embed the pattern knowledge learned from a training set in model parameters, the effect of this knowledge is contained in feature and classification embeddings (FE and CE). These embeddings can be extracted from a trained model and their global, class properties (e.g., frequency, magnitude and identity) can be analyzed. We find that important information regarding the ability of a neural network to generalize to minority classes resides in the class top-K CE and FE. We show that a CNN learns a limited number of class top-K CE per category, and that their number and magnitudes vary based on whether the same class is balanced or imbalanced. This calls into question whether a CNN has learned intrinsic class features, or merely frequently occurring ones that happen to exist in the sampled class distribution. We also hypothesize that latent class diversity is as important as the number of class examples, which has important implications for re-sampling and cost-sensitive methods. These methods generally focus on rebalancing model weights, class numbers and margins; instead of diversifying class latent features through augmentation. We also demonstrate that a CNN has difficulty generalizing to test data if the magnitude of its top-K latent features do not match the training set. We use three popular image datasets and two cost-sensitive algorithms commonly employed in imbalanced learning for our experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2022

Efficient Augmentation for Imbalanced Deep Learning

Deep learning models memorize training data, which hurts their ability t...
research
10/05/2020

Class-Wise Difficulty-Balanced Loss for Solving Class-Imbalance

Class-imbalance is one of the major challenges in real world datasets, w...
research
04/28/2018

Imbalanced Deep Learning by Minority Class Incremental Rectification

Model learning from class imbalanced training data is a long-standing an...
research
10/01/2019

Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Imbalanced Data

We propose a novel unsupervised generative model, Elastic-InfoGAN, that ...
research
05/09/2020

Generalizing Outside the Training Set: When Can Neural Networks Learn Identity Effects?

Often in language and other areas of cognition, whether two components o...
research
02/04/2017

Latent Hinge-Minimax Risk Minimization for Inference from a Small Number of Training Samples

Deep Learning (DL) methods show very good performance when trained on la...
research
03/15/2019

Visual recognition in the wild by sampling deep similarity functions

Recognising relevant objects or object states in its environment is a ba...

Please sign up or login with your details

Forgot password? Click here to reset