Softmax Is Not an Artificial Trick: An Information-Theoretic View of Softmax in Neural Networks

10/07/2019
by   Zhenyue Qin, et al.
0

Despite great popularity of applying softmax to map the non-normalised outputs of a neural network to a probability distribution over predicting classes, this normalised exponential transformation still seems to be artificial. A theoretic framework that incorporates softmax as an intrinsic component is still lacking. In this paper, we view neural networks embedding softmax from an information-theoretic perspective. Under this view, we can naturally and mathematically derive log-softmax as an inherent component in a neural network for evaluating the conditional mutual information between network output vectors and labels given an input datum. We show that training deterministic neural networks through maximising log-softmax is equivalent to enlarging the conditional mutual information, i.e., feeding label information into network outputs. We also generalise our informative-theoretic perspective to neural networks with stochasticity and derive information upper and lower bounds of log-softmax. In theory, such an information-theoretic view offers rationality support for embedding softmax in neural networks; in practice, we eventually demonstrate a computer vision application example of how to employ our information-theoretic view to filter out targeted objects on images.

READ FULL TEXT
research
11/25/2019

Rethinking Softmax with Cross-Entropy: Neural Network Classifier as Mutual Information Estimator

Mutual information is widely applied to learn latent representations of ...
research
03/01/2023

Information Plane Analysis for Dropout Neural Networks

The information-theoretic framework promises to explain the predictive p...
research
03/03/2023

Convex Bounds on the Softmax Function with Applications to Robustness Verification

The softmax function is a ubiquitous component at the output of neural n...
research
02/08/2021

Mutual Information of Neural Network Initialisations: Mean Field Approximations

The ability to train randomly initialised deep neural networks is known ...
research
01/10/2022

Information-Theoretic Bias Reduction via Causal View of Spurious Correlation

We propose an information-theoretic bias measurement technique through a...
research
06/01/2022

Merlin-Arthur Classifiers: Formal Interpretability with Interactive Black Boxes

We present a new theoretical framework for making black box classifiers ...
research
09/16/2020

Malicious Network Traffic Detection via Deep Learning: An Information Theoretic View

The attention that deep learning has garnered from the academic communit...

Please sign up or login with your details

Forgot password? Click here to reset