Robust AUC Optimization under the Supervision of Clean Data

11/19/2022

∙

AUC (area under the ROC curve) optimization algorithms have drawn much attention due to the incredible adaptability for seriously imbalanced data. Real-world datasets usually contain extensive noisy samples that seriously hinder the model performance, but a limited number of clean samples can be obtained easily. Although some AUC optimization studies make an effort to dispose of noisy samples, they do not utilize such clean samples well. In this paper, we propose a robust AUC optimization algorithm (RAUCO) with good use of available clean samples. Expressly, our RAUCO algorithm can exclude noisy samples from the training by employing the technology of self-paced learning (SPL) under the supervision of clean samples. Moreover, considering the impact of the data enhancement technology on SPL, we innovatively introduce the consistency regularization term to SPL. Theoretical results on the convergence of our RAUCO algorithm are provided under mild assumptions. Comprehensive experiments demonstrate that our RAUCO algorithm holds better robustness than existing algorithms.

READ FULL TEXT

Robust AUC Optimization under the Supervision of Clean Data

Learning with Noisy Labels over Imbalanced Subpopulations

Balanced Self-Paced Learning for AUC Maximization

Tripartite: Tackle Noisy Labels by a More Precise Partition

Suppressing Mislabeled Data via Grouping and Self-Attention

Does it pay to optimize AUC?

Minimax AUC Fairness: Efficient Algorithm with Provable Convergence

Identifying Mislabeled Data using the Area Under the Margin Ranking

Robust AUC Optimization under the Supervision of Clean Data

Related Research

Learning with Noisy Labels over Imbalanced Subpopulations

Balanced Self-Paced Learning for AUC Maximization

Tripartite: Tackle Noisy Labels by a More Precise Partition

Suppressing Mislabeled Data via Grouping and Self-Attention

Does it pay to optimize AUC?

Minimax AUC Fairness: Efficient Algorithm with Provable Convergence

Identifying Mislabeled Data using the Area Under the Margin Ranking