Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks using PAC-Bayesian Analysis

01/15/2019
by   Yusuke Tsuzuku, et al.
12

The notion of flat minima has played a key role in the generalization properties of deep learning models. However, existing definitions of the flatness are known to be sensitive to the rescaling of parameters. The issue suggests that the previous definitions of the flatness do not necessarily capture generalization, because generalization is invariant to such rescalings. In this paper, from the PAC-Bayesian perspective, we scrutinize the discussion concerning the flat minima and introduce the notion of normalized flat minima, which is free from the known scale dependence issues. Additionally, we highlight the insufficiency of existing matrix-norm based generalization error bounds. Our modified notion of the flatness does not suffer from the insufficiency, either, suggesting it better captures generalization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2020

Entropic gradient descent algorithms and wide flat minima

The properties of flat minima in the empirical risk landscape of neural ...
research
05/25/2023

How to escape sharp minima

Modern machine learning applications have seen a remarkable success of o...
research
02/14/2023

A modern look at the relationship between sharpness and generalization

Sharpness of minima is a promising quantity that can correlate with gene...
research
10/17/2018

The loss surface of deep linear networks viewed through the algebraic geometry lens

By using the viewpoint of modern computational algebraic geometry, we ex...
research
01/08/2021

BN-invariant sharpness regularizes the training model to better generalization

It is arguably believed that flatter minima can generalize better. Howev...
research
05/15/2022

Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective

The lottery ticket hypothesis (LTH) has attracted attention because it c...
research
05/30/2019

Deterministic PAC-Bayesian generalization bounds for deep networks via generalizing noise-resilience

The ability of overparameterized deep networks to generalize well has be...

Please sign up or login with your details

Forgot password? Click here to reset