Population structure-learned classifier for high-dimension low-sample-size class-imbalanced problem

09/10/2020
by   Liran Shen, et al.
0

The Classification on high-dimension low-sample-size data (HDLSS) is a challenging problem and it is common to have class-imbalanced data in most application fields. We term this as Imbalanced HDLSS (IHDLSS). Recent theoretical results reveal that the classification criterion and tolerance similarity are crucial to HDLSS, which emphasizes the maximization of within-class variance on the premise of class separability. Based on this idea, a novel linear binary classifier, termed Population Structure-learned Classifier (PSC), is proposed. The proposed PSC can obtain better generalization performance on IHDLSS by maximizing the sum of inter-class scatter matrix and intra-class scatter matrix on the premise of class separability and assigning different intercept values to majority and minority classes. The salient features of the proposed approach are: (1) It works well on IHDLSS; (2) The inverse of high dimensional matrix can be solved in low dimensional space; (3) It is self-adaptive in determining the intercept term for each class; (4) It has the same computational complexity as the SVM. A series of evaluations are conducted on one simulated data set and eight real-world benchmark data sets on IHDLSS on gene analysis. Experimental results demonstrate that the PSC is superior to the state-of-art methods in IHDLSS.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2020

The classification for High-dimension low-sample size data

Huge amount of applications in various fields, such as gene expression a...
research
01/05/2019

Population-Guided Large Margin Classifier for High-Dimension Low -Sample-Size Problems

Various applications in different fields, such as gene expression analys...
research
06/02/2020

Unsupervised Discretization by Two-dimensional MDL-based Histogram

Unsupervised discretization is a crucial step in many knowledge discover...
research
04/21/2020

Imbalanced Sparse Canonical Correlation Analysis

Classical canonical correlation analysis (CCA) requires matrices to be l...
research
08/30/2020

diproperm: An R Package for the DiProPerm Test

High-dimensional low sample size (HDLSS) data sets emerge frequently in ...
research
02/08/2019

Nearest Neighbor Classifier based on Generalized Inter-point Distances for HDLSS Data

In high dimension, low sample size (HDLSS) settings, Euclidean distance ...
research
08/22/2022

Low Complexity Classification Approach for Faster-than-Nyquist (FTN) Signalling Detection

Faster-than-Nyquist (FTN) signaling can improve the spectral efficiency ...

Please sign up or login with your details

Forgot password? Click here to reset