Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian

11/12/2020
by   Jack Parker-Holder, et al.
11

Over the last decade, a single algorithm has changed many facets of our lives - Stochastic Gradient Descent (SGD). In the era of ever decreasing loss functions, SGD and its various offspring have become the go-to optimization tool in machine learning and are a key component of the success of deep neural networks (DNNs). While SGD is guaranteed to converge to a local optimum (under loose assumptions), in some cases it may matter which local optimum is found, and this is often context-dependent. Examples frequently arise in machine learning, from shape-versus-texture-features to ensemble methods and zero-shot coordination. In these settings, there are desired solutions which SGD on 'standard' loss functions will not find, since it instead converges to the 'easy' solutions. In this paper, we present a different approach. Rather than following the gradient, which corresponds to a locally greedy direction, we instead follow the eigenvectors of the Hessian, which we call "ridges". By iteratively following and branching amongst the ridges, we effectively span the loss surface to find qualitatively different solutions. We show both theoretically and experimentally that our method, called Ridge Rider (RR), offers a promising direction for a variety of challenging problems.

READ FULL TEXT

page 2

page 9

page 15

research
10/01/2019

How noise affects the Hessian spectrum in overparameterized neural networks

Stochastic gradient descent (SGD) forms the core optimization method for...
research
12/24/2021

Lyapunov Exponents for Diversity in Differentiable Games

Ridge Rider (RR) is an algorithm for finding diverse solutions to optimi...
research
02/17/2018

An Alternative View: When Does SGD Escape Local Minima?

Stochastic gradient descent (SGD) is widely used in machine learning. Al...
research
04/04/2022

Deep learning, stochastic gradient descent and diffusion maps

Stochastic gradient descent (SGD) is widely used in deep learning due to...
research
06/04/2018

Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization

Stochastic descent methods (of the gradient and mirror varieties) have b...
research
03/18/2021

A deep learning theory for neural networks grounded in physics

In the last decade, deep learning has become a major component of artifi...
research
06/07/2022

Integrating Random Effects in Deep Neural Networks

Modern approaches to supervised learning like deep neural networks (DNNs...

Please sign up or login with your details

Forgot password? Click here to reset