Understanding Square Loss in Training Overparametrized Neural Network Classifiers

12/07/2021
by   Tianyang Hu, et al.
11

Deep learning has achieved many breakthroughs in modern classification tasks. Numerous architectures have been proposed for different data structures but when it comes to the loss function, the cross-entropy loss is the predominant choice. Recently, several alternative losses have seen revived interests for deep classifiers. In particular, empirical evidence seems to promote square loss but a theoretical justification is still lacking. In this work, we contribute to the theoretical understanding of square loss in classification by systematically investigating how it performs for overparametrized neural networks in the neural tangent kernel (NTK) regime. Interesting properties regarding the generalization error, robustness, and calibration error are revealed. We consider two cases, according to whether classes are separable or not. In the general non-separable case, fast convergence rate is established for both misclassification rate and calibration error. When classes are separable, the misclassification rate improves to be exponentially fast. Further, the resulting margin is proven to be lower bounded away from zero, providing theoretical guarantees for robustness. We expect our findings to hold beyond the NTK regime and translate to practical settings. To this end, we conduct extensive empirical studies on practical neural networks, demonstrating the effectiveness of square loss in both synthetic low-dimensional data and real image data. Comparing to cross-entropy, square loss has comparable generalization error but noticeable advantages in robustness and model calibration.

READ FULL TEXT

page 8

page 10

page 27

research
02/08/2023

Cut your Losses with Squentropy

Nearly all practical neural models for classification are trained using ...
research
06/12/2020

Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks

Modern neural architectures for classification tasks are trained using t...
research
04/16/2022

The Tree Loss: Improving Generalization with Many Classes

Multi-class classification problems often have many semantically similar...
research
07/25/2018

A Surprising Linear Relationship Predicts Test Performance in Deep Networks

Given two networks with the same training loss on a dataset, when would ...
research
09/22/2020

Role of Orthogonality Constraints in Improving Properties of Deep Networks for Image Classification

Standard deep learning models that employ the categorical cross-entropy ...
research
06/25/2020

Implicitly Maximizing Margins with the Hinge Loss

A new loss function is proposed for neural networks on classification ta...
research
03/11/2023

Generalizing and Decoupling Neural Collapse via Hyperspherical Uniformity Gap

The neural collapse (NC) phenomenon describes an underlying geometric sy...

Please sign up or login with your details

Forgot password? Click here to reset