Classification Trees for Imbalanced and Sparse Data: Surface-to-Volume Regularization

04/26/2020
by   Yichen Zhu, et al.
60

Classification algorithms face difficulties when one or more classes have limited training data. We are particularly interested in classification trees, due to their interpretability and flexibility. When data are limited in one or more of the classes, the estimated decision boundaries are often irregularly shaped due to the limited sample size, leading to poor generalization error. We propose a novel approach that penalizes the Surface-to-Volume Ratio (SVR) of the decision set, obtaining a new class of SVR-Tree algorithms. We develop a simple and computationally efficient implementation while proving estimation and feature selection consistency for SVR-Tree. SVR-Tree is compared with multiple algorithms that are designed to deal with imbalance through real data applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2015

HHCART: An Oblique Decision Tree

Decision trees are a popular technique in statistical data classificatio...
research
01/07/2012

Feature Selection via Regularized Trees

We propose a tree regularization framework, which enables many tree mode...
research
09/01/2021

An Empirical Study on the Joint Impact of Feature Selection and Data Resampling on Imbalance Classification

Real-world datasets often present different degrees of imbalanced (i.e.,...
research
09/08/2019

Training Effective Ensemble on Imbalanced Data by Self-paced Harmonizing Classification Hardness

Many real-world applications reveal difficulties in learning classifiers...
research
03/22/2021

Feature Selection for Imbalanced Data with Deep Sparse Autoencoders Ensemble

Class imbalance is a common issue in many domain applications of learnin...
research
06/12/2020

Generalizing Gain Penalization for Feature Selection in Tree-based Models

We develop a new approach for feature selection via gain penalization in...

Please sign up or login with your details

Forgot password? Click here to reset