Measuring and regularizing networks in function space

05/21/2018
by   Ari S. Benjamin, et al.
0

Neural network optimization is often conceptualized as optimizing parameters, but it is ultimately a matter of optimizing a function defined by inputs and outputs. However, little work has empirically evaluated network optimization in the space of possible functions and much analysis relies on Lipschitz bounds. Here, we measure the behavior of several networks in an L^2 Hilbert space. Lipschitz bounds appear reasonable in late optimization but not the beginning. We also observe that the function continues to change even after test error saturates. In light of this we propose a learning rule, Hilbert-constrained gradient descent (HCGD), that regularizes the distance a network can travel through L^2-space in any one update. HCGD should increase generalization if it is important that single updates minimally change the output function. Experiments show that HCGD reduces exploration in function space and often, but not always, improves generalization. We connect this idea to the natural gradient, which can also be derived from penalizing changes in the outputs. We conclude that decreased movement in function space is an important consideration in training neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/13/2018

Sorting out Lipschitz function approximation

Training neural networks subject to a Lipschitz constraint is useful for...
research
03/22/2018

Residual Networks: Lyapunov Stability and Convex Decomposition

While training error of most deep neural networks degrades as the depth ...
research
04/10/2021

SGD Implicitly Regularizes Generalization Error

We derive a simple and model-independent formula for the change in the g...
research
06/08/2021

What training reveals about neural network complexity

This work explores the hypothesis that the complexity of the function a ...
research
12/28/2022

Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks

We explore the ability of overparameterized shallow ReLU neural networks...
research
04/11/2021

The Many Faces of 1-Lipschitz Neural Networks

Lipschitz constrained models have been used to solve specifics deep lear...

Please sign up or login with your details

Forgot password? Click here to reset