Implicit regularisation in stochastic gradient descent: from single-objective to two-player games

07/11/2023
by   Mihaela Rosca, et al.
0

Recent years have seen many insights on deep learning optimisation being brought forward by finding implicit regularisation effects of commonly used gradient-based optimisers. Understanding implicit regularisation can not only shed light on optimisation dynamics, but it can also be used to improve performance and stability across problem domains, from supervised learning to two-player games such as Generative Adversarial Networks. An avenue for finding such implicit regularisation effects has been quantifying the discretisation errors of discrete optimisers via continuous-time flows constructed by backward error analysis (BEA). The current usage of BEA is not without limitations, since not all the vector fields of continuous-time flows obtained using BEA can be written as a gradient, hindering the construction of modified losses revealing implicit regularisers. In this work, we provide a novel approach to use BEA, and show how our approach can be used to construct continuous-time flows with vector fields that can be written as gradients. We then use this to find previously unknown implicit regularisation effects, such as those induced by multiple stochastic gradient descent steps while accounting for the exact data batches used in the updates, and in generally differentiable two-player games.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2016

Stochastic Gradient Descent in Continuous Time

Stochastic gradient descent in continuous time (SGDCT) provides a comput...
research
05/28/2021

Discretization Drift in Two-Player Games

Gradient-based methods for two-player games produce rich dynamics that c...
research
04/15/2020

Analysis of Stochastic Gradient Descent in Continuous Time

Stochastic gradient descent is an optimisation method that combines clas...
research
02/15/2018

The Mechanics of n-Player Differentiable Games

The cornerstone underpinning deep learning is the guarantee that gradien...
research
05/13/2019

Differentiable Game Mechanics

Deep learning is built on the foundational guarantee that gradient desce...
research
02/06/2023

Rethinking Gauss-Newton for learning over-parameterized models

Compared to gradient descent, Gauss-Newton's method (GN) and variants ar...
research
10/04/2015

Implicit stochastic approximation

The need to carry out parameter estimation from massive data has reinvig...

Please sign up or login with your details

Forgot password? Click here to reset