Online AUC Optimization for Sparse High-Dimensional Datasets

09/23/2020
by   Baojian Zhou, et al.
14

The Area Under the ROC Curve (AUC) is a widely used performance measure for imbalanced classification arising from many application domains where high-dimensional sparse data is abundant. In such cases, each d dimensional sample has only k non-zero features with k ≪ d, and data arrives sequentially in a streaming form. Current online AUC optimization algorithms have high per-iteration cost 𝒪(d) and usually produce non-sparse solutions in general, and hence are not suitable for handling the data challenge mentioned above. In this paper, we aim to directly optimize the AUC score for high-dimensional sparse datasets under online learning setting and propose a new algorithm, FTRL-AUC. Our proposed algorithm can process data in an online fashion with a much cheaper per-iteration cost 𝒪(k), making it amenable for high-dimensional sparse streaming data analysis. Our new algorithmic design critically depends on a novel reformulation of the U-statistics AUC objective function as the empirical saddle point reformulation, and the innovative introduction of the "lazy update" rule so that the per-iteration complexity is dramatically reduced from 𝒪(d) to 𝒪(k). Furthermore, FTRL-AUC can inherently capture sparsity more effectively by applying a generalized Follow-The-Regularized-Leader (FTRL) framework. Experiments on real-world datasets demonstrate that FTRL-AUC significantly improves both run time and model sparsity while achieving competitive AUC scores compared with the state-of-the-art methods. Comparison with the online learning method for logistic loss demonstrates that FTRL-AUC achieves higher AUC scores especially when datasets are imbalanced.

READ FULL TEXT
research
12/27/2016

A Sparse Nonlinear Classifier Design Using AUC Optimization

AUC (Area under the ROC curve) is an important performance measure for a...
research
06/14/2019

Stochastic Proximal AUC Maximization

In this paper we consider the problem of maximizing the Area under the R...
research
11/04/2020

Stochastic Hard Thresholding Algorithms for AUC Maximization

In this paper, we aim to develop stochastic hard thresholding algorithms...
research
11/25/2019

Projective Quadratic Regression for Online Learning

This paper considers online convex optimization (OCO) problems - the par...
research
05/26/2019

Dual Averaging Method for Online Graph-structured Sparsity

Online learning algorithms update models via one sample per iteration, t...
research
01/04/2022

Evolutionary Multitasking AUC Optimization

Learning to optimize the area under the receiver operating characteristi...
research
02/07/2018

Directly and Efficiently Optimizing Prediction Error and AUC of Linear Classifiers

The predictive quality of machine learning models is typically measured ...

Please sign up or login with your details

Forgot password? Click here to reset