DeepAI AI Chat
Log In Sign Up

Uses and Abuses of the Cross-Entropy Loss: Case Studies in Modern Deep Learning

by   Elliott Gordon-Rodriguez, et al.
Layer 6 AI
Columbia University

Modern deep learning is primarily an experimental science, in which empirical advances occasionally come at the expense of probabilistic rigor. Here we focus on one such example; namely the use of the categorical cross-entropy loss to model data that is not strictly categorical, but rather takes values on the simplex. This practice is standard in neural network architectures with label smoothing and actor-mimic reinforcement learning, amongst others. Drawing on the recently discovered continuous-categorical distribution, we propose probabilistically-inspired alternatives to these models, providing an approach that is more principled and theoretically appealing. Through careful experimentation, including an ablation study, we identify the potential for outperformance in these models, thereby highlighting the importance of a proper probabilistic treatment, as well as illustrating some of the failure modes thereof.


page 1

page 2

page 3

page 4


Rényi Cross-Entropy Measures for Common Distributions and Processes with Memory

Two Rényi-type generalizations of the Shannon cross-entropy, the Rényi c...

Loss Functions for Classification using Structured Entropy

Cross-entropy loss is the standard metric used to train classification m...

Deep Learning on Small Datasets without Pre-Training using Cosine Loss

Two things seem to be indisputable in the contemporary deep learning dis...

Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks

Modern neural architectures for classification tasks are trained using t...

The continuous categorical: a novel simplex-valued exponential family

Simplex-valued data appear throughout statistics and machine learning, f...

Instance Cross Entropy for Deep Metric Learning

Loss functions play a crucial role in deep metric learning thus a variet...

Learning to Rank for Plausible Plausibility

Researchers illustrate improvements in contextual encoding strategies vi...