Continuity of Generalized Entropy and Statistical Learning

by   Aolin Xu, et al.

We study the continuity property of the generalized entropy as a functional of the underlying probability distribution, defined with an action space and a loss function, and use this property to answer the basic questions in statistical learning theory, the excess risk analyses for various learning methods. We first derive upper and lower bounds for the entropy difference of two distributions in terms of several commonly used f-divergences, the Wasserstein distance, and a distance that depends on the action space and the loss function. Examples are given along with the discussion of each general result, comparisons are made with the existing entropy difference bounds, and new mutual information upper bounds are derived based on the new results. We then apply the entropy difference bounds to the theory of statistical learning. It is shown that the excess risks in the two popular learning paradigms, the frequentist learning and the Bayesian learning, both can be studied with the continuity property of different forms of the generalized entropy. The analysis is then extended to the continuity of generalized conditional entropy. The extension provides performance bounds for Bayes decision making with mismatched distributions. It also leads to excess risk bounds for a third paradigm of learning, where the decision rule is optimally designed under the projection of the empirical distribution to a predefined family of distributions. We thus establish a unified method of excess risk analysis for the three major paradigms of statistical learning, through the continuity of generalized entropy.


page 1

page 2

page 3

page 4


Chained Generalisation Bounds

This work discusses how to derive upper bounds for the expected generali...

Uniform continuity bound for sandwiched Rényi conditional entropy

We prove a simple uniform continuity bound for the sandwiched Rényi cond...

Orthogonal Statistical Learning with Self-Concordant Loss

Orthogonal statistical learning and double machine learning have emerged...

Generalized Bhattacharyya and Chernoff upper bounds on Bayes error using quasi-arithmetic means

Bayesian classification labels observations based on given prior informa...

Change the coefficients of conditional entropies in extensivity

The Boltzmann–Gibbs entropy is a functional on the space of probability ...

Tensor entropy for uniform hypergraphs

In this paper, we develop a new notion of entropy for uniform hypergraph...