Robust AUC Optimization under the Supervision of Clean Data

11/19/2022
by   Chenkang Zhang, et al.
0

AUC (area under the ROC curve) optimization algorithms have drawn much attention due to the incredible adaptability for seriously imbalanced data. Real-world datasets usually contain extensive noisy samples that seriously hinder the model performance, but a limited number of clean samples can be obtained easily. Although some AUC optimization studies make an effort to dispose of noisy samples, they do not utilize such clean samples well. In this paper, we propose a robust AUC optimization algorithm (RAUCO) with good use of available clean samples. Expressly, our RAUCO algorithm can exclude noisy samples from the training by employing the technology of self-paced learning (SPL) under the supervision of clean samples. Moreover, considering the impact of the data enhancement technology on SPL, we innovatively introduce the consistency regularization term to SPL. Theoretical results on the convergence of our RAUCO algorithm are provided under mild assumptions. Comprehensive experiments demonstrate that our RAUCO algorithm holds better robustness than existing algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2022

Learning with Noisy Labels over Imbalanced Subpopulations

Learning with Noisy Labels (LNL) has attracted significant attention fro...
research
07/08/2022

Balanced Self-Paced Learning for AUC Maximization

Learning to improve AUC performance is an important topic in machine lea...
research
02/19/2022

Tripartite: Tackle Noisy Labels by a More Precise Partition

Samples in large-scale datasets may be mislabeled due to various reasons...
research
10/29/2020

Suppressing Mislabeled Data via Grouping and Self-Attention

Deep networks achieve excellent results on large-scale clean data but de...
research
06/02/2023

Does it pay to optimize AUC?

The Area Under the ROC Curve (AUC) is an important model metric for eval...
research
08/22/2022

Minimax AUC Fairness: Efficient Algorithm with Provable Convergence

The use of machine learning models in consequential decision making ofte...
research
01/28/2020

Identifying Mislabeled Data using the Area Under the Margin Ranking

Not all data in a typical training set help with generalization; some sa...

Please sign up or login with your details

Forgot password? Click here to reset