Early Stopping is Nonparametric Variational Inference

04/06/2015
by   Dougal Maclaurin, et al.
0

We show that unconverged stochastic gradient descent can be interpreted as a procedure that samples from a nonparametric variational approximate posterior distribution. This distribution is implicitly defined as the transformation of an initial distribution by a sequence of optimization updates. By tracking the change in entropy over this sequence of transformations during optimization, we form a scalable, unbiased estimate of the variational lower bound on the log marginal likelihood. We can use this bound to optimize hyperparameters instead of using cross-validation. This Bayesian interpretation of SGD suggests improved, overfitting-resistant optimization procedures, and gives a theoretical foundation for popular tricks such as early stopping and ensembling. We investigate the properties of this marginal likelihood estimator on neural network models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2020

Bayesian Neural Network via Stochastic Gradient Descent

The goal of bayesian approach used in variational inference is to minimi...
research
06/27/2012

Variational Bayesian Inference with Stochastic Search

Mean-field variational inference is a method for approximate Bayesian po...
research
02/08/2016

A Variational Analysis of Stochastic Gradient Algorithms

Stochastic Gradient Descent (SGD) is an important algorithm in machine l...
research
11/20/2020

Gradient Regularisation as Approximate Variational Inference

Variational inference in Bayesian neural networks is usually performed u...
research
02/20/2020

Bounding the expected run-time of nonconvex optimization with early stopping

This work examines the convergence of stochastic gradient-based optimiza...
research
10/12/2022

Alpha-divergence Variational Inference Meets Importance Weighted Auto-Encoders: Methodology and Asymptotics

Several algorithms involving the Variational Rényi (VR) bound have been ...
research
04/07/2023

A Policy for Early Sequence Classification

Sequences are often not received in their entirety at once, but instead,...

Please sign up or login with your details

Forgot password? Click here to reset