Determination of class-specific variables in nonparametric multiple-class classification

05/07/2022
by   Wan-Ping Nicole Chen, et al.
0

As technology advanced, collecting data via automatic collection devices become popular, thus we commonly face data sets with lengthy variables, especially when these data sets are collected without specific research goals beforehand. It has been pointed out in the literature that the difficulty of high-dimensional classification problems is intrinsically caused by too many noise variables useless for reducing classification error, which offer less benefits for decision-making, and increase complexity, and confusion in model-interpretation. A good variable selection strategy is therefore a must for using such kinds of data well; especially when we expect to use their results for the succeeding applications/studies, where the model-interpretation ability is essential. hus, the conventional classification measures, such as accuracy, sensitivity, precision, cannot be the only performance tasks. In this paper, we propose a probability-based nonparametric multiple-class classification method, and integrate it with the ability of identifying high impact variables for individual class such that we can have more information about its classification rule and the character of each class as well. The proposed method can have its prediction power approximately equal to that of the Bayes rule, and still retains the ability of "model-interpretation." We report the asymptotic properties of the proposed method, and use both synthesized and real data sets to illustrate its properties under different classification situations. We also separately discuss the variable identification, and training sample size determination, and summarize those procedures as algorithms such that users can easily implement them with different computing languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/04/2019

Fast Multi-Class Probabilistic Classifier by Sparse Non-parametric Density Estimation

The model interpretation is essential in many application scenarios and ...
research
07/07/2021

ENNS: Variable Selection, Regression, Classification and Deep Neural Network for High-Dimensional Data

High-dimensional, low sample-size (HDLSS) data problems have been a topi...
research
01/29/2019

Active learning for binary classification with variable selection

Modern computing and communication technologies can make data collection...
research
11/22/2017

Sparse Variable Selection on High Dimensional Heterogeneous Data with Tree Structured Responses

We consider the problem of sparse variable selection on high dimension h...
research
12/22/2018

Distributed sequential method for analyzing massive data

To analyse a very large data set containing lengthy variables, we adopt ...
research
12/17/2021

Disentangled representations: towards interpretation of sex determination from hip bone

By highlighting the regions of the input image that contribute the most ...
research
03/02/2019

Sequential estimation for GEE with adaptive variables and subject selection

Modeling correlated or highly stratified multiple-response data becomes ...

Please sign up or login with your details

Forgot password? Click here to reset