Smooth Loss Functions for Deep Top-k Classification

by   Leonard Berrada, et al.
University of Oxford

The top-k error is a common measure of performance in machine learning and computer vision. In practice, top-k classification is typically performed with deep neural networks trained with the cross-entropy loss. Theoretical results indeed suggest that cross-entropy is an optimal learning objective for such a task in the limit of infinite data. In the context of limited and noisy data however, the use of a loss function that is specifically designed for top-k classification can bring significant improvements. Our empirical evidence suggests that the loss function must be smooth and have non-sparse gradients in order to work well with deep neural networks. Consequently, we introduce a family of smoothed loss functions that are suited to top-k optimization via deep learning. The widely used cross-entropy is a special case of our family. Evaluating our smooth loss functions is computationally challenging: a naïve algorithm would require O(nk) operations, where n is the number of classes. Thanks to a connection to polynomial algebra and a divide-and-conquer approach, we provide an algorithm with a time complexity of O(k n). Furthermore, we present a novel approximation to obtain fast and stable algorithms on GPUs with single floating point precision. We compare the performance of the cross-entropy loss and our margin-based losses in various regimes of noise and data size, for the predominant use case of k=5. Our investigation reveals that our loss is more robust to noise and overfitting than cross-entropy.


Taming the Cross Entropy Loss

We present the Tamed Cross Entropy (TCE) loss function, a robust derivat...

Cross-Entropy Loss Functions: Theoretical Analysis and Applications

Cross-entropy is a widely used loss function in applications. It coincid...

The Earth Mover's Pinball Loss: Quantiles for Histogram-Valued Regression

Although ubiquitous in the sciences, histogram data have not received mu...

Hyperplane bounds for neural feature mappings

Deep learning methods minimise the empirical risk using loss functions s...

The Effect of the Loss on Generalization: Empirical Study on Synthetic Lung Nodule Data

Convolutional Neural Networks (CNNs) are widely used for image classific...

Optimizing Non-decomposable Measures with Deep Networks

We present a class of algorithms capable of directly training deep neura...

Please sign up or login with your details

Forgot password? Click here to reset