Log In Sign Up

Risk-Monotonicity in Statistical Learning

by   Zakaria Mhammedi, et al.

Acquisition of data is a difficult task in many applications of machine learning, and it is only natural that one hopes and expects the populating risk to decrease (better performance) monotonically with increasing data points. It turns out, somewhat surprisingly, that this is not the case even for the most standard algorithms such as empirical risk minimization. Non-monotonic behaviour of the risk and instability in training have manifested and appeared in the popular deep learning paradigm under the description of double descent. These problems highlight bewilderment in our understanding of learning algorithms and generalization. It is, therefore, crucial to pursue this concern and provide a characterization of such behaviour. In this paper, we derive the first consistent and risk-monotonic algorithms for a general statistical learning setting under weak assumptions, consequently resolving an open problem (Viering et al. 2019) on how to avoid non-monotonic behaviour of risk curves. Our work makes a significant contribution to the topic of risk-monotonicity, which may be key in resolving empirical phenomena such as double descent.


page 1

page 2

page 3

page 4


A Brief Prehistory of Double Descent

In their thought-provoking paper [1], Belkin et al. illustrate and discu...

Optimal Regularization Can Mitigate Double Descent

Recent empirical and theoretical studies have shown that many learning a...

Invariant Risk Minimization Games

The standard risk minimization paradigm of machine learning is brittle w...

Mitigating multiple descents: A model-agnostic framework for risk monotonization

Recent empirical and theoretical analyses of several commonly used predi...

Monotonic Risk Relationships under Distribution Shifts for Regularized Risk Minimization

Machine learning systems are often applied to data that is drawn from a ...

Monotone Learning

The amount of training-data is one of the key factors which determines t...

Asymptotic Risk of Overparameterized Likelihood Models: Double Descent Theory for Deep Neural Networks

We investigate the asymptotic risk of a general class of overparameteriz...