Cut your Losses with Squentropy

02/08/2023
by   Like Hui, et al.
0

Nearly all practical neural models for classification are trained using cross-entropy loss. Yet this ubiquitous choice is supported by little theoretical or empirical evidence. Recent work (Hui Belkin, 2020) suggests that training using the (rescaled) square loss is often superior in terms of the classification accuracy. In this paper we propose the "squentropy" loss, which is the sum of two terms: the cross-entropy loss and the average square loss over the incorrect classes. We provide an extensive set of experiments on multi-class classification problems showing that the squentropy loss outperforms both the pure cross entropy and rescaled square losses in terms of the classification accuracy. We also demonstrate that it provides significantly better model calibration than either of these alternative losses and, furthermore, has less variance with respect to the random initialization. Additionally, in contrast to the square loss, squentropy loss can typically be trained using exactly the same optimization parameters, including the learning rate, as the standard cross-entropy loss, making it a true "plug-and-play" replacement. Finally, unlike the rescaled square loss, multiclass squentropy contains no parameters that need to be adjusted.

READ FULL TEXT

page 5

page 7

page 12

page 13

page 14

page 15

research
06/12/2020

Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks

Modern neural architectures for classification tasks are trained using t...
research
12/07/2021

Understanding Square Loss in Training Overparametrized Neural Network Classifiers

Deep learning has achieved many breakthroughs in modern classification t...
research
07/25/2018

A Surprising Linear Relationship Predicts Test Performance in Deep Networks

Given two networks with the same training loss on a dataset, when would ...
research
09/12/2021

Mixing between the Cross Entropy and the Expectation Loss Terms

The cross entropy loss is widely used due to its effectiveness and solid...
research
10/18/2022

Multi-Source Transformer Architectures for Audiovisual Scene Classification

In this technical report, the systems we submitted for subtask 1B of the...
research
10/14/2020

Temperature check: theory and practice for training models with softmax-cross-entropy losses

The softmax function combined with a cross-entropy loss is a principled ...
research
04/16/2022

The Tree Loss: Improving Generalization with Many Classes

Multi-class classification problems often have many semantically similar...

Please sign up or login with your details

Forgot password? Click here to reset