Deforming the Loss Surface

07/24/2020
by   Liangming Chen, et al.
11

In deep learning, it is usually assumed that the shape of the loss surface is fixed. Differently, a novel concept of deformation operator is first proposed in this paper to deform the loss surface, thereby improving the optimization. Deformation function, as a type of deformation operator, can improve the generalization performance. Moreover, various deformation functions are designed, and their contributions to the loss surface are further provided. Then, the original stochastic gradient descent optimizer is theoretically proved to be a flat minima filter that owns the talent to filter out the sharp minima. Furthermore, the flatter minima could be obtained by exploiting the proposed deformation functions, which is verified on CIFAR-100, with visualizations of loss landscapes near the critical points obtained by both the original optimizer and optimizer enhanced by deformation functions. The experimental results show that deformation functions do find flatter regions. Moreover, on ImageNet, CIFAR-10, and CIFAR-100, popular convolutional neural networks enhanced by deformation functions are compared with the corresponding original models, where significant improvements are observed on all of the involved models equipped with deformation functions. For example, the top-1 test accuracy of ResNet-20 on CIFAR-100 increases by 1.46 additional computational overhead.

READ FULL TEXT

page 16

page 17

page 18

page 19

page 20

page 21

page 22

page 23

research
09/14/2020

Deforming the Loss Surface to Affect the Behaviour of the Optimizer

In deep learning, it is usually assumed that the optimization process is...
research
09/05/2020

S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima

The stochastic gradient descent (SGD) method is most widely used for dee...
research
05/21/2018

SmoothOut: Smoothing Out Sharp Minima for Generalization in Large-Batch Deep Learning

In distributed deep learning, a large batch size in Stochastic Gradient ...
research
06/16/2020

Directional Pruning of Deep Neural Networks

In the light of the fact that the stochastic gradient descent (SGD) ofte...
research
02/11/2020

Goldilocks Neural Networks

We introduce the new "Goldilocks" class of activation functions, which n...
research
04/29/2016

Deep Convolutional Neural Networks on Cartoon Functions

Wiatowski and Bölcskei, 2015, proved that deformation stability and vert...
research
08/14/2014

Regularized Harmonic Surface Deformation

Harmonic surface deformation is a well-known geometric modeling method t...

Please sign up or login with your details

Forgot password? Click here to reset