Efficient Generalization Improvement Guided by Random Weight Perturbation

11/21/2022
by   Tao Li, et al.
0

To fully uncover the great potential of deep neural networks (DNNs), various learning algorithms have been developed to improve the model's generalization ability. Recently, sharpness-aware minimization (SAM) establishes a generic scheme for generalization improvements by minimizing the sharpness measure within a small neighborhood and achieves state-of-the-art performance. However, SAM requires two consecutive gradient evaluations for solving the min-max problem and inevitably doubles the training time. In this paper, we resort to filter-wise random weight perturbations (RWP) to decouple the nested gradients in SAM. Different from the small adversarial perturbations in SAM, RWP is softer and allows a much larger magnitude of perturbations. Specifically, we jointly optimize the loss function with random perturbations and the original loss function: the former guides the network towards a wider flat region while the latter helps recover the necessary local information. These two loss terms are complementary to each other and mutually independent. Hence, the corresponding gradients can be efficiently computed in parallel, enabling nearly the same training speed as regular training. As a result, we achieve very competitive performance on CIFAR and remarkably better performance on ImageNet (e.g. +1.1%) compared with SAM, but always require half of the training time. The code is released at https://github.com/nblt/RWP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2021

δ-SAM: Sharpness-Aware Minimization with Dynamic Reweighting

Deep neural networks are often overparameterized and may not easily achi...
research
04/05/2021

Rethinking Perturbations in Encoder-Decoders for Fast Training

We often use perturbations to regularize neural models. For neural encod...
research
08/17/2022

Two Heads are Better than One: Robust Learning Meets Multi-branch Models

Deep neural networks (DNNs) are vulnerable to adversarial examples, in w...
research
04/03/2020

Gradient Centralization: A New Optimization Technique for Deep Neural Networks

Optimization techniques are of great importance to effectively and effic...
research
10/11/2022

Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach

Deep neural networks often suffer from poor generalization caused by com...
research
10/03/2020

Sharpness-Aware Minimization for Efficiently Improving Generalization

In today's heavily overparameterized models, the value of the training l...
research
02/23/2021

ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks

Recently, learning algorithms motivated from sharpness of loss surface a...

Please sign up or login with your details

Forgot password? Click here to reset