Bregman Divergence Bounds and the Universality of the Logarithmic Loss

10/14/2018
by   Amichai Painsky, et al.
0

A loss function measures the discrepancy between the true values and their estimated fits, for a given instance of data. In classification problems, a loss function is said to be proper if the minimizer of the expected loss is the true underlying probability. In this work we show that for binary classification, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a normalization constant. It implies that by minimizing the log-loss (associated with the KL divergence), we minimize an upper bound to any choice of loss from this set. This property suggests that the log-loss is universal in the sense that it provides performance guarantees to a broad class of accuracy measures. Importantly, our notion of universality is not restricted to a specific problem. This allows us to apply our results to many applications, including predictive modeling, data clustering and sample complexity analysis. Further, we show that the KL divergence bounds from above any separable Bregman divergence that is convex in its second argument (up to a normalization constant). This result introduces a new set of divergence inequalities, similar to Pinsker inequality, and extends well-known f-divergence inequality results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2018

On the Universality of the Logistic Loss Function

A loss function measures the discrepancy between the true values (observ...
research
09/23/2022

A Jensen-Shannon Divergence Based Loss Function for Bayesian Neural Networks

Kullback-Leibler (KL) divergence is widely used for variational inferenc...
research
04/21/2020

An Information-Theoretic Proof of the Streaming Switching Lemma for Symmetric Encryption

Motivated by a fundamental paradigm in cryptography, we consider a recen...
research
03/10/2014

Generalised Mixability, Constant Regret, and Bayesian Updating

Mixability of a loss is known to characterise when constant regret bound...
research
06/06/2021

Reducing the feature divergence of RGB and near-infrared images using Switchable Normalization

Visual pattern recognition over agricultural areas is an important appli...
research
11/19/2019

On the Upper Bound of the Kullback-Leibler Divergence and Cross Entropy

This archiving article consists of several short reports on the discussi...
research
11/04/2019

Proximal Langevin Algorithm: Rapid Convergence Under Isoperimetry

We study the Proximal Langevin Algorithm (PLA) for sampling from a proba...

Please sign up or login with your details

Forgot password? Click here to reset