Mutual Information Learned Classifiers: an Information-theoretic Viewpoint of Training Deep Learning Classification Systems

09/21/2022
by   Jirong Yi, et al.
21

Deep learning systems have been reported to achieve state-of-the-art performances in many applications, and a key is the existence of well trained classifiers on benchmark datasets. As a main-stream loss function, the cross entropy can easily lead us to find models which demonstrate severe overfitting behavior. In this paper, we show that the existing cross entropy loss minimization problem essentially learns the label conditional entropy (CE) of the underlying data distribution of the dataset. However, the CE learned in this way does not characterize well the information shared by the label and the input. In this paper, we propose a mutual information learning framework where we train deep neural network classifiers via learning the mutual information between the label and the input. Theoretically, we give the population classification error lower bound in terms of the mutual information. In addition, we derive the mutual information lower and upper bounds for a concrete binary classification data model in ℝ^n, and also the error probability lower bound in this scenario. Empirically, we conduct extensive experiments on several benchmark datasets to support our theory. The mutual information learned classifiers (MILCs) achieve far better generalization performances than the conditional entropy learned classifiers (CELCs) with an improvement which can exceed more than 10% in testing accuracy.

READ FULL TEXT

page 2

page 10

page 16

research
06/19/2021

Neural Network Classifier as Mutual Information Evaluator

Cross-entropy loss with softmax output is a standard choice to train neu...
research
11/23/2022

Mutual Information Learned Regressor: an Information-theoretic Viewpoint of Training Regression Systems

As one of the central tasks in machine learning, regression finds lots o...
research
06/01/2022

Merlin-Arthur Classifiers: Formal Interpretability with Interactive Black Boxes

We present a new theoretical framework for making black box classifiers ...
research
09/17/2023

Conditional Mutual Information Constrained Deep Learning for Classification

The concepts of conditional mutual information (CMI) and normalized cond...
research
09/08/2019

L_DMI: An Information-theoretic Noise-robust Loss Function

Accurately annotating large scale dataset is notoriously expensive both ...
research
05/23/2023

Reviewing Evolution of Learning Functions and Semantic Information Measures for Understanding Deep Learning

A new trend in deep learning, represented by Mutual Information Neural E...
research
05/28/2019

Understanding the Behaviour of the Empirical Cross-Entropy Beyond the Training Distribution

Machine learning theory has mostly focused on generalization to samples ...

Please sign up or login with your details

Forgot password? Click here to reset