Online Classification with Complex Metrics

10/23/2016
by   Bowei Yan, et al.
0

We present a framework and analysis of consistent binary classification for complex and non-decomposable performance metrics such as the F-measure and the Jaccard measure. The proposed framework is general, as it applies to both batch and online learning, and to both linear and non-linear models. Our work follows recent results showing that the Bayes optimal classifier for many complex metrics is given by a thresholding of the conditional probability of the positive class. This manuscript extends this thresholding characterization -- showing that the utility is strictly locally quasi-concave with respect to the threshold for a wide range of models and performance metrics. This, in turn, motivates simple normalized gradient ascent updates for threshold estimation. We present a finite-sample regret analysis for the resulting procedure. In particular, the risk for the batch case converges to the Bayes risk at the same rate as that of the underlying conditional probability estimation, and the risk of proposed online algorithm converges at a rate that depends on the conditional probability estimation risk. For instance, in the special case where the conditional probability model is logistic regression, our procedure achieves O(1/√(n)) sample complexity, both for batch and online training. Empirical evaluation shows that the proposed algorithms out-perform alternatives in practice, with comparable or better prediction performance and reduced run time for various metrics and datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2015

Optimal Decision-Theoretic Classification Using Non-Decomposable Performance Metrics

We provide a general theoretical analysis of expected out-of-sample util...
research
06/02/2018

Binary Classification with Karmic, Threshold-Quasi-Concave Metrics

Complex performance measures, beyond the popular measure of accuracy, ar...
research
08/15/2023

High-Probability Risk Bounds via Sequential Predictors

Online learning methods yield sequential regret bounds under minimal ass...
research
06/12/2019

Leveraging Labeled and Unlabeled Data for Consistent Fair Binary Classification

We study the problem of fair binary classification using the notion of E...
research
04/22/2016

Approximation Vector Machines for Large-scale Online Learning

One of the most challenging problems in kernel online learning is to bou...
research
12/23/2019

An improper estimator with optimal excess risk in misspecified density estimation and logistic regression

We introduce a procedure for predictive conditional density estimation u...
research
12/20/2014

Competing with the Empirical Risk Minimizer in a Single Pass

In many estimation problems, e.g. linear and logistic regression, we wis...

Please sign up or login with your details

Forgot password? Click here to reset