Divide-and-Conquer Hard-thresholding Rules in High-dimensional Imbalanced Classification

11/05/2021
by   Arezou Mojiri, et al.
0

In binary classification, imbalance refers to situations in which one class is heavily under-represented. This issue is due to either a data collection process or because one class is indeed rare in a population. Imbalanced classification frequently arises in applications such as biology, medicine, engineering, and social sciences. In this manuscript, for the first time, we theoretically study the impact of imbalance class sizes on the linear discriminant analysis (LDA) in high dimensions. We show that due to data scarcity in one class, referred to as the minority class, and high-dimensionality of the feature space, the LDA ignores the minority class yielding a maximum misclassification rate. We then propose a new construction of a hard-thresholding rule based on a divide-and-conquer technique that reduces the large difference between the misclassification rates. We show that the proposed method is asymptotically optimal. We further study two well-known sparse versions of the LDA in imbalanced cases. We evaluate the finite-sample performance of different methods using simulations and by analyzing two real data sets. The results show that our method either outperforms its competitors or has comparable performance based on a much smaller subset of selected features, while being computationally more efficient.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/05/2021

Statistical Theory for Imbalanced Binary Classification

Within the vast body of statistical theory developed for binary classifi...
research
02/19/2018

Weighted Linear Discriminant Analysis based on Class Saliency Information

In this paper, we propose a new variant of Linear Discriminant Analysis ...
research
02/03/2016

Discriminative Sparse Neighbor Approximation for Imbalanced Learning

Data imbalance is common in many vision tasks where one or more classes ...
research
05/09/2020

A Compressive Classification Framework for High-Dimensional Data

We propose a compressive classification framework for settings where the...
research
09/17/2015

Sparse Fisher's Linear Discriminant Analysis for Partially Labeled Data

Classification is an important tool with many useful applications. Among...
research
12/02/2019

Matrix sketching for supervised classification with imbalanced classes

Matrix sketching is a recently developed data compression technique. An ...
research
07/12/2017

Influence of Resampling on Accuracy of Imbalanced Classification

In many real-world binary classification tasks (e.g. detection of certai...

Please sign up or login with your details

Forgot password? Click here to reset