Sparse Linear Discriminant Analysis under the Neyman-Pearson Paradigm

by   Xin Tong, et al.

In contrast to the classical binary classification paradigm that minimizes the overall classification error, the Neyman-Pearson (NP) paradigm seeks classifiers with a minimal type II error while having a constrained type I error under a user-specified level, addressing asymmetric type I/II error priorities. In this work, we present NP-sLDA, a new binary NP classifier that explicitly takes into account feature dependency under high-dimensional NP settings. This method adapts the popular sparse linear discriminant analysis (sLDA, Mai et al. (2012)) to the NP paradigm. We borrow the threshold determination method from the umbrella algorithm in Tong et al. (2017). On the theoretical front, we formulate a new conditional margin assumption and a new conditional detection condition to accommodate unbounded feature support, and show that NP-sLDA satisfies the NP oracle inequalities, which are natural NP paradigm counterparts of the oracle inequalities in classical classification. Numerical results show that NP-sLDA is a valuable addition to existing NP classifiers. We also suggest a general data-adaptive sample splitting scheme that, in many scenarios, improves the classification performance upon the default half-half class 0 split used in Tong et al. (2017), and this new splitting scheme has been incorporated into a new version of the R package nproc.


page 1

page 2

page 3

page 4


Neyman-Pearson Classification under High-Dimensional Settings

Most existing binary classification methods target on the optimization o...

Neyman-Pearson Multi-class Classification via Cost-sensitive Learning

Most existing classification methods aim to minimize the overall misclas...

Neyman-Pearson classification, convexity and stochastic constraints

Motivated by problems of anomaly detection, this paper implements the Ne...

LACBoost and FisherBoost: Optimally Building Cascade Classifiers

Object detection is one of the key tasks in computer vision. The cascade...

Intentional control of type I error over unconscious data distortion: a Neyman-Pearson classification approach

The rise of social media enables millions of citizens to generate inform...

High Dimensional Discrete Integration by Hashing and Optimization

Recently Ermon et al. (2013) pioneered an ingenuous way to practically c...