On the Universality of the Logistic Loss Function

05/10/2018
by   Amichai Painsky, et al.
0

A loss function measures the discrepancy between the true values (observations) and their estimated fits, for a given instance of data. A loss function is said to be proper (unbiased, Fisher consistent) if the fits are defined over a unit simplex, and the minimizer of the expected loss is the true underlying probability of the data. Typical examples are the zero-one loss, the quadratic loss and the Bernoulli log-likelihood loss (log-loss). In this work we show that for binary classification problems, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a multiplicative normalization constant. It implies that by minimizing the log-loss (associated with the KL divergence), we minimize an upper bound to any choice of loss functions from this set. This property justifies the broad use of log-loss in regression, decision trees, deep neural networks and many other applications. In addition, we show that the KL divergence bounds from above any separable Bregman divergence that is convex in its second argument (up to a multiplicative normalization constant). This result introduces a new set of divergence inequalities, similar to the well-known Pinsker inequality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2018

Bregman Divergence Bounds and the Universality of the Logarithmic Loss

A loss function measures the discrepancy between the true values and the...
research
09/23/2022

A Jensen-Shannon Divergence Based Loss Function for Bayesian Neural Networks

Kullback-Leibler (KL) divergence is widely used for variational inferenc...
research
01/18/2023

An Analysis of Loss Functions for Binary Classification and Regression

This paper explores connections between margin-based loss functions and ...
research
07/09/2011

Loss-sensitive Training of Probabilistic Conditional Random Fields

We consider the problem of training probabilistic conditional random fie...
research
04/21/2020

An Information-Theoretic Proof of the Streaming Switching Lemma for Symmetric Encryption

Motivated by a fundamental paradigm in cryptography, we consider a recen...
research
06/02/2021

General Bayesian Loss Function Selection and the use of Improper Models

Statisticians often face the choice between using probability models or ...
research
11/05/2019

An Alternative Probabilistic Interpretation of the Huber Loss

The Huber loss is a robust loss function used for a wide range of regres...

Please sign up or login with your details

Forgot password? Click here to reset