Sparse Fisher's Linear Discriminant Analysis for Partially Labeled Data

09/17/2015
by   Qiyi Lu, et al.
0

Classification is an important tool with many useful applications. Among the many classification methods, Fisher's Linear Discriminant Analysis (LDA) is a traditional model-based approach which makes use of the covariance information. However, in the high-dimensional, low-sample size setting, LDA cannot be directly deployed because the sample covariance is not invertible. While there are modern methods designed to deal with high-dimensional data, they may not fully use the covariance information as LDA does. Hence in some situations, it is still desirable to use a model-based method such as LDA for classification. This article exploits the potential of LDA in more complicated data settings. In many real applications, it is costly to manually place labels on observations; hence it is often that only a small portion of labeled data is available while a large number of observations are left without a label. It is a great challenge to obtain good classification performance through the labeled data alone, especially when the dimension is greater than the size of the labeled data. In order to overcome this issue, we propose a semi-supervised sparse LDA classifier to take advantage of the seemingly useless unlabeled data. They provide additional information which helps to boost the classification performance in some situations. A direct estimation method is used to reconstruct LDA and achieve the sparsity; meanwhile we employ the difference-convex algorithm to handle the non-convex loss function associated with the unlabeled data. Theoretical properties of the proposed classifier are studied. Our simulated examples help to understand when and how the information extracted from the unlabeled data can be useful. A real data example further illustrates the usefulness of the proposed method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2015

Significance Analysis of High-Dimensional, Low-Sample Size Partially Labeled Data

Classification and clustering are both important topics in statistical l...
research
07/08/2019

Asymptotic Bayes risk for Gaussian mixture in a semi-supervised setting

Semi-supervised learning (SSL) uses unlabeled data for training and has ...
research
05/16/2016

Classification of Big Data with Application to Imaging Genetics

Big data applications, such as medical imaging and genetics, typically g...
research
05/03/2020

High Dimensional Classification for Spatially Dependent Data with Application to Neuroimaging

Discriminating patients with Alzheimer's disease (AD) from healthy subje...
research
01/21/2013

Supervised Classification Using Sparse Fisher's LDA

It is well known that in a supervised classification setting when the nu...
research
11/05/2021

Divide-and-Conquer Hard-thresholding Rules in High-dimensional Imbalanced Classification

In binary classification, imbalance refers to situations in which one cl...
research
10/09/2022

A Locally Adaptive Shrinkage Approach to False Selection Rate Control in High-Dimensional Classification

The uncertainty quantification and error control of classifiers are cruc...

Please sign up or login with your details

Forgot password? Click here to reset