A Unified DRO View of Multi-class Loss Functions with top-N Consistency

by   Dixian Zhu, et al.

Multi-class classification is one of the most common tasks in machine learning applications, where data is labeled by one of many class labels. Many loss functions have been proposed for multi-class classification including two well-known ones, namely the cross-entropy (CE) loss and the crammer-singer (CS) loss (aka. the SVM loss). While CS loss has been used widely for traditional machine learning tasks, CE loss is usually a default choice for multi-class deep learning tasks. There are also top-k variants of CS loss and CE loss that are proposed to promote the learning of a classifier for achieving better top-k accuracy. Nevertheless, it still remains unclear the relationship between these different losses, which hinders our understanding of their expectations in different scenarios. In this paper, we present a unified view of the CS/CE losses and their smoothed top-k variants by proposing a new family of loss functions, which are arguably better than the CS/CE losses when the given label information is incomplete and noisy. The new family of smooth loss functions named label-distributionally robust (LDR) loss is defined by leveraging the distributionally robust optimization (DRO) framework to model the uncertainty in the given label information, where the uncertainty over true class labels is captured by using distributional weights for each label regularized by a function.


page 1

page 2

page 3

page 4


On the Robustness of Average Losses for Partial-Label Learning

Partial-label (PL) learning is a typical weakly supervised classificatio...

Cross-Entropy Loss Functions: Theoretical Analysis and Applications

Cross-entropy is a widely used loss function in applications. It coincid...

Every Untrue Label is Untrue in its Own Way: Controlling Error Type with the Log Bilinear Loss

Deep learning has become the method of choice in many application domain...

Exploiting Class Similarity for Machine Learning with Confidence Labels and Projective Loss Functions

Class labels used for machine learning are relatable to each other, with...

Multi-Objective Optimization for Self-Adjusting Weighted Gradient in Machine Learning Tasks

Much of the focus in machine learning research is placed in creating new...

The Geometry of Mixability

Mixable loss functions are of fundamental importance in the context of p...

Unimodal Distributions for Ordinal Regression

In many real-world prediction tasks, class labels contain information ab...

Please sign up or login with your details

Forgot password? Click here to reset