A framework for overparameterized learning

05/26/2022

∙

An explanation for the success of deep neural networks is a central question in theoretical machine learning. According to classical statistical learning, the overparameterized nature of such models should imply a failure to generalize. Many argue that good empirical performance is due to the implicit regularization of first order optimization methods. In particular, the Polyak-Łojasiewicz condition leads to gradient descent finding a global optimum that is close to initialization. In this work, we propose a framework consisting of a prototype learning problem, which is general enough to cover many popular problems and even the cases of infinitely wide neural networks and infinite data. We then perform an analysis from the perspective of the Polyak-Łojasiewicz condition. We obtain theoretical results of independent interest, concerning gradient descent on a composition (f ∘ F): G →ℝ of functions F: G → H and f: H →ℝ with G, H being Hilbert spaces. Building on these results, we determine the properties that have to be satisfied by the components of the prototype problem for gradient descent to find a global optimum that is close to initialization. We then demonstrate that supervised learning, variational autoencoders and training with gradient penalty can be translated to the prototype problem. Finally, we lay out a number of directions for future research.

READ FULL TEXT

A framework for overparameterized learning

Sign in with Google

Consider DeepAI Pro