How to escape sharp minima

05/25/2023
by   Kwangjun Ahn, et al.
0

Modern machine learning applications have seen a remarkable success of optimization algorithms that are designed to find flat minima. Motivated by this paradigm, this work formulates and studies the algorithmic question of how to find flat minima. As an initial effort, this work adopts the trace of hessian of the cost function as the measure of flatness, and formally defines the notion of approximate flat minima. Under this notion, we then design algorithms that find approximate flat minima efficiently. For general cost functions, we present a gradient-based algorithm that finds an approximate flat local minimum efficiently. The main component of the algorithm is to use gradients computed from randomly perturbed iterates to estimate a direction that leads to flatter minima. For the setting where the cost function is an empirical risk over training data, we present a faster algorithm that is inspired by a recently proposed practical algorithm called sharpness-aware minimization, supporting its success in practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2020

A Diffusion Theory for Deep Learning Dynamics: Stochastic Gradient Descent Escapes From Sharp Minima Exponentially Fast

Stochastic optimization algorithms, such as Stochastic Gradient Descent ...
research
06/14/2023

Noise Stability Optimization for Flat Minima with Optimal Convergence Rates

We consider finding flat, local minimizers by adding average weight pert...
research
11/23/2018

Parallel sequential Monte Carlo for stochastic optimization

We propose a parallel sequential Monte Carlo optimization method to mini...
research
05/10/2019

The sharp, the flat and the shallow: Can weakly interacting agents learn to escape bad minima?

An open problem in machine learning is whether flat minima generalize be...
research
02/22/2021

Non-Convex Optimization with Spectral Radius Regularization

We develop a regularization method which finds flat minima during the tr...
research
01/15/2019

Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks using PAC-Bayesian Analysis

The notion of flat minima has played a key role in the generalization pr...
research
08/21/2018

Search for Common Minima in Joint Optimization of Multiple Cost Functions

We present a novel optimization method, named the Combined Optimization ...

Please sign up or login with your details

Forgot password? Click here to reset