Log In Sign Up

Minimum sharpness: Scale-invariant parameter-robustness of neural networks

by   Hikaru Ibayashi, et al.

Toward achieving robust and defensive neural networks, the robustness against the weight parameters perturbations, i.e., sharpness, attracts attention in recent years (Sun et al., 2020). However, sharpness is known to remain a critical issue, "scale-sensitivity." In this paper, we propose a novel sharpness measure, Minimum Sharpness. It is known that NNs have a specific scale transformation that constitutes equivalent classes where functional properties are completely identical, and at the same time, their sharpness could change unlimitedly. We define our sharpness through a minimization problem over the equivalent NNs being invariant to the scale transformation. We also develop an efficient and exact technique to make the sharpness tractable, which reduces the heavy computational costs involved with Hessian. In the experiment, we observed that our sharpness has a valid correlation with the generalization of NNs and runs with less computational cost than existing sharpness measures.


page 1

page 2

page 3

page 4


Measuring Representational Robustness of Neural Networks Through Shared Invariances

A major challenge in studying robustness in deep learning is defining th...

A Reparameterization-Invariant Flatness Measure for Deep Neural Networks

The performance of deep neural networks is often attributed to their aut...

Formalizing Generalization and Robustness of Neural Networks to Weight Perturbations

Studying the sensitivity of weight perturbation in neural networks and i...

ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks

Recently, learning algorithms motivated from sharpness of loss surface a...

The Pitfalls of Simplicity Bias in Neural Networks

Several works have proposed Simplicity Bias (SB)—the tendency of standar...

BN-invariant sharpness regularizes the training model to better generalization

It is arguably believed that flatter minima can generalize better. Howev...