Cross-Entropy Loss Functions: Theoretical Analysis and Applications

by   Anqi Mao, et al.

Cross-entropy is a widely used loss function in applications. It coincides with the logistic loss applied to the outputs of a neural network, when the softmax is used. But, what guarantees can we rely on when using cross-entropy as a surrogate loss? We present a theoretical analysis of a broad family of losses, comp-sum losses, that includes cross-entropy (or logistic loss), generalized cross-entropy, the mean absolute error and other loss cross-entropy-like functions. We give the first H-consistency bounds for these loss functions. These are non-asymptotic guarantees that upper bound the zero-one loss estimation error in terms of the estimation error of a surrogate loss, for the specific hypothesis set H used. We further show that our bounds are tight. These bounds depend on quantities called minimizability gaps, which only depend on the loss function and the hypothesis set. To make them more explicit, we give a specific analysis of these gaps for comp-sum losses. We also introduce a new family of loss functions, smooth adversarial comp-sum losses, derived from their comp-sum counterparts by adding in a related smooth term. We show that these loss functions are beneficial in the adversarial setting by proving that they admit H-consistency bounds. This leads to new adversarial robustness algorithms that consist of minimizing a regularized smooth adversarial comp-sum loss. While our main purpose is a theoretical analysis, we also present an extensive empirical analysis comparing comp-sum losses. We further report the results of a series of experiments demonstrating that our adversarial robustness algorithms outperform the current state-of-the-art, while also achieving a superior non-adversarial accuracy.


page 1

page 2

page 3

page 4


Smooth Loss Functions for Deep Top-k Classification

The top-k error is a common measure of performance in machine learning a...

Algorithms and Theory for Multiple-Source Adaptation

This work includes a number of novel contributions for the multiple-sour...

Ranking with Abstention

We introduce a novel framework of ranking with abstention, where the lea...

A Unified DRO View of Multi-class Loss Functions with top-N Consistency

Multi-class classification is one of the most common tasks in machine le...

Robustness of different loss functions and their impact on networks learning capability

Recent developments in AI have made it ubiquitous, every industry is try...

ℋ-Consistency Estimation Error of Surrogate Loss Minimizers

We present a detailed study of estimation errors in terms of surrogate l...

Please sign up or login with your details

Forgot password? Click here to reset