What's in a Loss Function for Image Classification?

by   Simon Kornblith, et al.

It is common to use the softmax cross-entropy loss to train neural networks on classification datasets where a single class label is assigned to each example. However, it has been shown that modifying softmax cross-entropy with label smoothing or regularizers such as dropout can lead to higher performance. This paper studies a variety of loss functions and output layer regularization strategies on image classification tasks. We observe meaningful differences in model predictions, accuracy, calibration, and out-of-distribution robustness for networks trained with different objectives. However, differences in hidden representations of networks trained with different objectives are restricted to the last few layers; representational similarity reveals no differences among network layers that are not close to the output. We show that all objectives that improve over vanilla softmax loss produce greater class separation in the penultimate layer of the network, which potentially accounts for improved performance on the original task, but results in features that transfer worse to other tasks.


page 7

page 17

page 18


Handwritten Chinese Character Recognition by Convolutional Neural Network and Similarity Ranking

Convolution Neural Networks (CNN) have recently achieved state-of-the ar...

Being Bayesian about Categorical Probability

Neural networks utilize the softmax as a building block in classificatio...

Chi-square Loss for Softmax: an Echo of Neural Network Structure

Softmax working with cross-entropy is widely used in classification, whi...

Interpreting Bias in the Neural Networks: A Peek Into Representational Similarity

Neural networks trained on standard image classification data sets are s...

On Expected Accuracy

We empirically investigate the (negative) expected accuracy as an altern...

Role of Orthogonality Constraints in Improving Properties of Deep Networks for Image Classification

Standard deep learning models that employ the categorical cross-entropy ...

Mixture separability loss in a deep convolutional network for image classification

In machine learning, the cost function is crucial because it measures ho...

Please sign up or login with your details

Forgot password? Click here to reset