Generalization Properties of Stochastic Optimizers via Trajectory Analysis

08/02/2021
by   Liam Hodgkinson, et al.
0

Despite the ubiquitous use of stochastic optimization algorithms in machine learning, the precise impact of these algorithms on generalization performance in realistic non-convex settings is still poorly understood. In this paper, we provide an encompassing theoretical framework for investigating the generalization properties of stochastic optimizers, which is based on their dynamics. We first prove a generalization bound attributable to the optimizer dynamics in terms of the celebrated Fernique-Talagrand functional applied to the trajectory of the optimizer. This data- and algorithm-dependent bound is shown to be the sharpest possible in the absence of further assumptions. We then specialize this result by exploiting the Markovian structure of stochastic optimizers, deriving generalization bounds in terms of the (data-dependent) transition kernels associated with the optimization algorithms. In line with recent work that has revealed connections between generalization and heavy-tailed behavior in stochastic optimization, we link the generalization error to the local tail behavior of the transition kernels. We illustrate that the local power-law exponent of the kernel acts as an effective dimension, which decreases as the transitions become "less Gaussian". We support our theory with empirical results from a variety of neural networks, and we show that both the Fernique-Talagrand functional and the local power-law exponent are predictive of generalization performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/25/2023

High Probability Analysis for Non-Convex Stochastic Optimization with Clipping

Gradient clipping is a commonly used technique to stabilize the training...
research
06/09/2021

Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms

Understanding generalization in deep learning has been one of the major ...
research
06/11/2020

Multiplicative noise and heavy tails in stochastic optimization

Although stochastic optimization is central to modern machine learning, ...
research
06/02/2022

Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on Least Squares

Recent studies have shown that heavy tails can emerge in stochastic opti...
research
05/13/2022

Heavy-Tail Phenomenon in Decentralized SGD

Recent theoretical studies have shown that heavy-tails can emerge in sto...
research
05/20/2020

Beyond the storage capacity: data driven satisfiability transition

Data structure has a dramatic impact on the properties of neural network...
research
12/05/2022

Rethinking the Structure of Stochastic Gradients: Empirical and Statistical Evidence

Stochastic gradients closely relate to both optimization and generalizat...

Please sign up or login with your details

Forgot password? Click here to reset