A Characterization of the Combined Effects of Overlap and Imbalance on the SVM Classifier

09/16/2011
by   Misha Denil, et al.
0

In this paper we demonstrate that two common problems in Machine Learning---imbalanced and overlapping data distributions---do not have independent effects on the performance of SVM classifiers. This result is notable since it shows that a model of either of these factors must account for the presence of the other. Our study of the relationship between these problems has lead to the discovery of a previously unreported form of "covert" overfitting which is resilient to commonly used empirical regularization techniques. We demonstrate the existance of this covert phenomenon through several methods based around the parametric regularization of trained SVMs. Our findings in this area suggest a possible approach to quantifying overlap in real world data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2022

Local overlap reduction procedure for dynamic ensemble selection

Class imbalance is a characteristic known for making learning more chall...
research
11/04/2019

A Study of Data Pre-processing Techniques for Imbalanced Biomedical Data Classification

Biomedical data are widely accepted in developing prediction models for ...
research
01/15/2020

On Model Evaluation under Non-constant Class Imbalance

Many real-world classification problems are significantly class-imbalanc...
research
08/26/2020

Appropriateness of Performance Indices for Imbalanced Data Classification: An Analysis

Indices quantifying the performance of classifiers under class-imbalance...
research
07/29/2021

On the combined effect of class imbalance and concept complexity in deep learning

Structural concept complexity, class overlap, and data scarcity are some...
research
06/21/2018

Robust and Efficient Boosting Method using the Conditional Risk

Well-known for its simplicity and effectiveness in classification, AdaBo...
research
04/07/2020

CSMOUTE: Combined Synthetic Oversampling and Undersampling Technique for Imbalanced Data Classification

In this paper we propose two novel data-level algorithms for handling da...

Please sign up or login with your details

Forgot password? Click here to reset