Significance Analysis of High-Dimensional, Low-Sample Size Partially Labeled Data

09/21/2015
by   Qiyi Lu, et al.
0

Classification and clustering are both important topics in statistical learning. A natural question herein is whether predefined classes are really different from one another, or whether clusters are really there. Specifically, we may be interested in knowing whether the two classes defined by some class labels (when they are provided), or the two clusters tagged by a clustering algorithm (where class labels are not provided), are from the same underlying distribution. Although both are challenging questions for the high-dimensional, low-sample size data, there has been some recent development for both. However, when it is costly to manually place labels on observations, it is often that only a small portion of the class labels is available. In this article, we propose a significance analysis approach for such type of data, namely partially labeled data. Our method makes use of the whole data and tries to test the class difference as if all the labels were observed. Compared to a testing method that ignores the label information, our method provides a greater power, meanwhile, maintaining the size, illustrated by a comprehensive simulation study. Theoretical properties of the proposed method are studied with emphasis on the high-dimensional, low-sample size setting. Our simulated examples help to understand when and how the information extracted from the labeled data can be effective. A real data example further illustrates the usefulness of the proposed method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2015

Sparse Fisher's Linear Discriminant Analysis for Partially Labeled Data

Classification is an important tool with many useful applications. Among...
research
12/12/2013

Clustering for high-dimension, low-sample size data using distance vectors

In high-dimension, low-sample size (HDLSS) data, it is not always true t...
research
05/21/2008

Kendall's tau in high-dimensional genomic parsimony

High-dimensional data models, often with low sample size, abound in many...
research
06/06/2018

On high-dimensional modifications of some graph-based two-sample tests

Testing for the equality of two high-dimensional distributions is a chal...
research
10/28/2022

The non-significance factor is a simple posterior estimate of the minimum necessary sample size

A researcher is interested in what sample size is needed to get the requ...
research
06/21/2020

The classification for High-dimension low-sample size data

Huge amount of applications in various fields, such as gene expression a...
research
04/01/2018

An overview of uniformity tests on the hypersphere

When modeling directional data, that is, unit-norm multivariate vectors,...

Please sign up or login with your details

Forgot password? Click here to reset