A framework for overparameterized learning

05/26/2022
by   Dávid Terjék, et al.
0

An explanation for the success of deep neural networks is a central question in theoretical machine learning. According to classical statistical learning, the overparameterized nature of such models should imply a failure to generalize. Many argue that good empirical performance is due to the implicit regularization of first order optimization methods. In particular, the Polyak-Łojasiewicz condition leads to gradient descent finding a global optimum that is close to initialization. In this work, we propose a framework consisting of a prototype learning problem, which is general enough to cover many popular problems and even the cases of infinitely wide neural networks and infinite data. We then perform an analysis from the perspective of the Polyak-Łojasiewicz condition. We obtain theoretical results of independent interest, concerning gradient descent on a composition (f ∘ F): G →ℝ of functions F: G → H and f: H →ℝ with G, H being Hilbert spaces. Building on these results, we determine the properties that have to be satisfied by the components of the prototype problem for gradient descent to find a global optimum that is close to initialization. We then demonstrate that supervised learning, variational autoencoders and training with gradient penalty can be translated to the prototype problem. Finally, we lay out a number of directions for future research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/29/2022

Mirror descent of Hopfield model

Mirror descent is a gradient descent method that uses a dual space of pa...
research
07/14/2021

Continuous vs. Discrete Optimization of Deep Neural Networks

Existing analyses of optimization in deep learning are either continuous...
research
03/30/2022

Convergence of gradient descent for deep neural networks

Optimization by gradient descent has been one of main drivers of the "de...
research
09/18/2020

Linear Convergence and Implicit Regularization of Generalized Mirror Descent with Time-Dependent Mirrors

The following questions are fundamental to understanding the properties ...
research
11/25/2021

Predicting the success of Gradient Descent for a particular Dataset-Architecture-Initialization (DAI)

Despite their massive success, training successful deep neural networks ...
research
02/16/2022

How to Fill the Optimum Set? Population Gradient Descent with Harmless Diversity

Although traditional optimization methods focus on finding a single opti...
research
09/15/2020

Learning Functors using Gradient Descent

Neural networks are a general framework for differentiable optimization ...

Please sign up or login with your details

Forgot password? Click here to reset