Studying Generalization Through Data Averaging

06/28/2022
by   Carlos A. Gomez-Uribe, et al.
0

The generalization of machine learning models has a complex dependence on the data, model and learning algorithm. We study train and test performance, as well as the generalization gap given by the mean of their difference over different data set samples to understand their “typical" behavior. We derive an expression for the gap as a function of the covariance between the model parameter distribution and the train loss, and another expression for the average test performance, showing test generalization only depends on data-averaged parameter distribution and the data-averaged loss. We show that for a large class of model parameter distributions a modified generalization gap is always non-negative. By specializing further to parameter distributions produced by stochastic gradient descent (SGD), along with a few approximations and modeling considerations, we are able to predict some aspects about how the generalization gap and model train and test performance vary as a function of SGD noise. We evaluate these predictions empirically on the Cifar10 classification task based on a ResNet architecture.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2021

Learning Curves for SGD on Structured Features

The generalization performance of a machine learning algorithm such as a...
research
04/10/2021

SGD Implicitly Regularizes Generalization Error

We derive a simple and model-independent formula for the change in the g...
research
06/30/2020

AdaSGD: Bridging the gap between SGD and Adam

In the context of stochastic gradient descent(SGD) and adaptive moment e...
research
08/21/2021

How Can Increased Randomness in Stochastic Gradient Descent Improve Generalization?

Recent works report that increasing the learning rate or decreasing the ...
research
06/25/2021

Assessing Generalization of SGD via Disagreement

We empirically show that the test error of deep networks can be estimate...
research
03/23/2020

A classification for the performance of online SGD for high-dimensional inference

Stochastic gradient descent (SGD) is a popular algorithm for optimizatio...
research
07/23/2019

Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions

We consider a co-variate shift problem where one has access to several m...

Please sign up or login with your details

Forgot password? Click here to reset