Analytic Study of Double Descent in Binary Classification: The Impact of Loss

01/30/2020
by   Ganesh Kini, et al.
0

Extensive empirical evidence reveals that, for a wide range of different learning methods and datasets, the risk curve exhibits a double-descent (DD) trend as a function of the model size. In a recent paper [Zeyu,Kammoun,Thrampoulidis,2019] the authors studied binary linear classification models and showed that the test error of gradient descent (GD) with logistic loss undergoes a DD. In this paper, we complement these results by extending them to GD with square loss. We show that the DD phenomenon persists, but we also identify several differences compared to logistic loss. This emphasizes that crucial features of DD curves (such as their transition threshold and global minima) depend both on the training data and on the learning algorithm. We further study the dependence of DD curves on the size of the training set. Similar to our earlier work, our results are analytic: we plot the DD curves by first deriving sharp asymptotics for the test error under Gaussian features. Albeit simple, the models permit a principled study of DD features, the outcomes of which theoretically corroborate related empirical findings occurring in more complex learning tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/13/2019

A Model of Double Descent for High-dimensional Binary Linear Classification

We consider a model for logistic regression where only a subset of featu...
research
12/04/2020

When does gradient descent with logistic loss find interpolating two-layer networks?

We study the training of finite-width two-layer smoothed ReLU networks f...
research
12/06/2021

Multi-scale Feature Learning Dynamics: Insights for Double Descent

A key challenge in building theoretical foundations for deep learning is...
research
04/07/2020

A Brief Prehistory of Double Descent

In their thought-provoking paper [1], Belkin et al. illustrate and discu...
research
02/21/2023

A Log-linear Gradient Descent Algorithm for Unbalanced Binary Classification using the All Pairs Squared Hinge Loss

Receiver Operating Characteristic (ROC) curves are plots of true positiv...
research
03/23/2020

A termination criterion for stochastic gradient descent for binary classification

We propose a new, simple, and computationally inexpensive termination te...
research
03/19/2021

The Shape of Learning Curves: a Review

Learning curves provide insight into the dependence of a learner's gener...

Please sign up or login with your details

Forgot password? Click here to reset