Predictive Value Generalization Bounds

by   Keshav Vemuri, et al.

In this paper, we study a bi-criterion framework for assessing scoring functions in the context of binary classification. The positive and negative predictive values (ppv and npv, respectively) are conditional probabilities of the true label matching a classifier's predicted label. The usual classification error rate is a linear combination of these probabilities, and therefore, concentration inequalities for the error rate do not yield confidence intervals for the two separate predictive values. We study generalization properties of scoring functions with respect to predictive values by deriving new distribution-free large deviation and uniform convergence bounds. The latter bound is stated in terms of a measure of function class complexity that we call the order coefficient; we relate this combinatorial quantity to the VC-subgraph dimension.


page 1

page 2

page 3

page 4


Supersparse Linear Integer Models for Predictive Scoring Systems

We introduce Supersparse Linear Integer Models (SLIM) as a tool to creat...

Distribution-free binary classification: prediction sets, confidence intervals and calibration

We study three notions of uncertainty quantification—calibration, confid...

Variance-adaptive confidence sequences by betting

This paper derives confidence intervals (CI) and time-uniform confidence...

Algorithms and Complexity for Functions on General Domains

Error bounds and complexity bounds in numerical analysis and information...

Prevalence Threshold and bounds in the Accuracy of Binary Classification Systems

The accuracy of binary classification systems is defined as the proporti...

Rademacher Generalization Bounds for Classifier Chains

In this paper, we propose a new framework to study the generalization pr...

Feature Selection via Probabilistic Outputs

This paper investigates two feature-scoring criteria that make use of es...