Regression as Classification: Influence of Task Formulation on Neural Network Features

11/10/2022
by   Lawrence Stewart, et al.
0

Neural networks can be trained to solve regression problems by using gradient-based methods to minimize the square loss. However, practitioners often prefer to reformulate regression as a classification problem, observing that training on the cross entropy loss results in better performance. By focusing on two-layer ReLU networks, which can be fully characterized by measures over their feature space, we explore how the implicit bias induced by gradient-based optimization could partly explain the above phenomenon. We provide theoretical evidence that the regression formulation yields a measure whose support can differ greatly from that for classification, in the case of one-dimensional data. Our proposed optimal supports correspond directly to the features learned by the input layer of the network. The different nature of these supports sheds light on possible optimization difficulties the square loss could encounter during training, and we present empirical results illustrating this phenomenon.

READ FULL TEXT
research
02/11/2020

Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss

Neural networks trained to minimize the logistic (a.k.a. cross-entropy) ...
research
10/27/2020

Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning

We identify an implicit under-parameterization phenomenon in value-based...
research
01/07/2021

Towards Understanding Learning in Neural Networks with Linear Teachers

Can a neural network minimizing cross-entropy learn linearly separable d...
research
01/21/2023

Improving Deep Regression with Ordinal Entropy

In computer vision, it is often observed that formulating regression pro...
research
06/12/2020

Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks

Modern neural architectures for classification tasks are trained using t...
research
12/19/2018

A Note on Lazy Training in Supervised Differentiable Programming

In a series of recent theoretical works, it has been shown that strongly...
research
08/07/2022

Preserving Fine-Grain Feature Information in Classification via Entropic Regularization

Labeling a classification dataset implies to define classes and associat...

Please sign up or login with your details

Forgot password? Click here to reset