Flexible High-dimensional Classification Machines and Their Asymptotic Properties

10/11/2013
by   Xingye Qiao, et al.
0

Classification is an important topic in statistics and machine learning with great potential in many real applications. In this paper, we investigate two popular large margin classification methods, Support Vector Machine (SVM) and Distance Weighted Discrimination (DWD), under two contexts: the high-dimensional, low-sample size data and the imbalanced data. A unified family of classification machines, the FLexible Assortment MachinE (FLAME) is proposed, within which DWD and SVM are special cases. The FLAME family helps to identify the similarities and differences between SVM and DWD. It is well known that many classifiers overfit the data in the high-dimensional setting; and others are sensitive to the imbalanced data, that is, the class with a larger sample size overly influences the classifier and pushes the decision boundary towards the minority class. SVM is resistant to the imbalanced data issue, but it overfits high-dimensional data sets by showing the undesired data-piling phenomena. The DWD method was proposed to improve SVM in the high-dimensional setting, but its decision boundary is sensitive to the imbalanced ratio of sample sizes. Our FLAME family helps to understand an intrinsic connection between SVM and DWD, and improves both methods by providing a better trade-off between sensitivity to the imbalanced data and overfitting the high-dimensional data. Several asymptotic properties of the FLAME classifiers are studied. Simulations and real data applications are investigated to illustrate the usefulness of the FLAME classifiers.

READ FULL TEXT
research
10/11/2013

Distance-weighted Support Vector Machine

A novel linear classification method that possesses the merits of both t...
research
01/23/2019

Large dimensional analysis of general margin based classification methods

Margin-based classifiers have been popular in both machine learning and ...
research
06/24/2023

Robust Classification of High-Dimensional Data using Data-Adaptive Energy Distance

Classification of high-dimensional low sample size (HDLSS) data poses a ...
research
01/05/2019

Population-Guided Large Margin Classifier for High-Dimension Low -Sample-Size Problems

Various applications in different fields, such as gene expression analys...
research
04/16/2019

Discriminative Regression Machine: A Classifier for High-Dimensional Data or Imbalanced Data

We introduce a discriminative regression approach to supervised classifi...
research
07/16/2020

Large scale analysis of generalization error in learning using margin based classification methods

Large-margin classifiers are popular methods for classification. We deri...
research
08/21/2020

Defending Distributed Classifiers Against Data Poisoning Attacks

Support Vector Machines (SVMs) are vulnerable to targeted training data ...

Please sign up or login with your details

Forgot password? Click here to reset