Temperature check: theory and practice for training models with softmax-cross-entropy losses

10/14/2020
by   Atish Agarwala, et al.
10

The softmax function combined with a cross-entropy loss is a principled approach to modeling probability distributions that has become ubiquitous in deep learning. The softmax function is defined by a lone hyperparameter, the temperature, that is commonly set to one or regarded as a way to tune model confidence after training; however, less is known about how the temperature impacts training dynamics or generalization performance. In this work we develop a theory of early learning for models trained with softmax-cross-entropy loss and show that the learning dynamics depend crucially on the inverse-temperature β as well as the magnitude of the logits at initialization, ||β z||_2. We follow up these analytic results with a large-scale empirical study of a variety of model architectures trained on CIFAR10, ImageNet, and IMDB sentiment analysis. We find that generalization performance depends strongly on the temperature, but only weakly on the initial logit magnitude. We provide evidence that the dependence of generalization on β is not due to changes in model confidence, but is a dynamical phenomenon. It follows that the addition of β as a tunable hyperparameter is key to maximizing model performance. Although we find the optimal β to be sensitive to the architecture, our results suggest that tuning β over the range 10^-2 to 10^1 improves performance over all architectures studied. We find that smaller β may lead to better peak performance at the cost of learning stability.

READ FULL TEXT
research
06/28/2022

On the Rényi Cross-Entropy

The Rényi cross-entropy measure between two distributions, a generalizat...
research
02/08/2023

Cut your Losses with Squentropy

Nearly all practical neural models for classification are trained using ...
research
11/28/2022

Mathematically Modeling the Lexicon Entropy of Emergent Language

We formulate a stochastic process, FiLex, as a mathematical model of lex...
research
07/16/2020

Amended Cross Entropy Cost: Framework For Explicit Diversity Encouragement

Cross Entropy (CE) has an important role in machine learning and, in par...
research
03/13/2023

Collision Cross-entropy and EM Algorithm for Self-labeled Classification

We propose "collision cross-entropy" as a robust alternative to the Shan...
research
12/19/2019

Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax

The Gumbel-Softmax is a continuous distribution over the simplex that is...
research
10/18/2022

Fine-tune your Classifier: Finding Correlations With Temperature

Temperature is a widely used hyperparameter in various tasks involving n...

Please sign up or login with your details

Forgot password? Click here to reset