Combining learning rate decay and weight decay with complexity gradient descent - Part I

02/07/2019
by   Pierre H. Richemond, et al.
0

The role of L^2 regularization, in the specific case of deep neural networks rather than more traditional machine learning models, is still not fully elucidated. We hypothesize that this complex interplay is due to the combination of overparameterization and high dimensional phenomena that take place during training and make it unamenable to standard convex optimization methods. Using insights from statistical physics and random fields theory, we introduce a parameter factoring in both the level of the loss function and its remaining nonconvexity: the complexity. We proceed to show that it is desirable to proceed with complexity gradient descent. We then show how to use this intuition to derive novel and efficient annealing schemes for the strength of L^2 regularization when performing standard stochastic gradient descent in deep neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2023

QLAB: Quadratic Loss Approximation-Based Optimal Learning Rate for Deep Learning

We propose a learning rate adaptation scheme, called QLAB, for descent o...
research
06/28/2019

Neural ODEs as the Deep Limit of ResNets with constant weights

In this paper we prove that, in the deep limit, the stochastic gradient ...
research
08/26/2021

Comparing Classes of Estimators: When does Gradient Descent Beat Ridge Regression in Linear Models?

Modern methods for learning from data depend on many tuning parameters, ...
research
03/24/2018

Gradient descent in Gaussian random fields as a toy model for high-dimensional optimisation in deep learning

In this paper we model the loss function of high-dimensional optimizatio...
research
08/25/2021

Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization

Adaptive gradient methods such as Adam have gained increasing popularity...
research
02/05/2018

Learning Compact Neural Networks with Regularization

We study the impact of regularization for learning neural networks. Our ...
research
03/01/2021

Deep Learning with a Classifier System: Initial Results

This article presents the first results from using a learning classifier...

Please sign up or login with your details

Forgot password? Click here to reset