What's in a Loss Function for Image Classification?

by   Simon Kornblith, et al.

It is common to use the softmax cross-entropy loss to train neural networks on classification datasets where a single class label is assigned to each example. However, it has been shown that modifying softmax cross-entropy with label smoothing or regularizers such as dropout can lead to higher performance. This paper studies a variety of loss functions and output layer regularization strategies on image classification tasks. We observe meaningful differences in model predictions, accuracy, calibration, and out-of-distribution robustness for networks trained with different objectives. However, differences in hidden representations of networks trained with different objectives are restricted to the last few layers; representational similarity reveals no differences among network layers that are not close to the output. We show that all objectives that improve over vanilla softmax loss produce greater class separation in the penultimate layer of the network, which potentially accounts for improved performance on the original task, but results in features that transfer worse to other tasks.



page 7

page 17

page 18


Handwritten Chinese Character Recognition by Convolutional Neural Network and Similarity Ranking

Convolution Neural Networks (CNN) have recently achieved state-of-the ar...

Being Bayesian about Categorical Probability

Neural networks utilize the softmax as a building block in classificatio...

Chi-square Loss for Softmax: an Echo of Neural Network Structure

Softmax working with cross-entropy is widely used in classification, whi...

Leveraging Class Similarity to Improve Deep Neural Network Robustness

Traditionally artificial neural networks (ANNs) are trained by minimizin...

On Expected Accuracy

We empirically investigate the (negative) expected accuracy as an altern...

Role of Orthogonality Constraints in Improving Properties of Deep Networks for Image Classification

Standard deep learning models that employ the categorical cross-entropy ...

Rethinking Feature Distribution for Loss Functions in Image Classification

We propose a large-margin Gaussian Mixture (L-GM) loss for deep neural n...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.