On the benefits of non-linear weight updates

07/25/2022
by   Paul Norridge, et al.
14

Recent work has suggested that the generalisation performance of a DNN is related to the extent to which the Signal-to-Noise Ratio is optimised at each of the nodes. In contrast, Gradient Descent methods do not always lead to SNR-optimal weight configurations. One way to improve SNR performance is to suppress large weight updates and amplify small weight updates. Such balancing is already implicit in some common optimizers, but we propose an approach that makes this explicit. The method applies a non-linear function to gradients prior to making DNN parameter updates. We investigate the performance with such non-linear approaches. The result is an adaptation to existing optimizers that improves performance for many problem types.

READ FULL TEXT

page 8

page 9

page 10

page 18

page 19

page 20

page 21

page 22

research
02/11/2020

Think Global, Act Local: Relating DNN generalisation and node-level SNR

The reasons behind good DNN generalisation remain an open question. In t...
research
01/09/2018

Convergence Analysis of Gradient Descent Algorithms with Proportional Updates

The rise of deep learning in recent years has brought with it increasing...
research
04/30/2018

Interpreting weight maps in terms of cognitive or clinical neuroscience: nonsense?

Since machine learning models have been applied to neuroimaging data, re...
research
11/05/2015

Symmetry-invariant optimization in deep networks

Recent works have highlighted scale invariance or symmetry that is prese...
research
06/10/2015

A Scale Mixture Perspective of Multiplicative Noise in Neural Networks

Corrupting the input and hidden layers of deep neural networks (DNNs) wi...
research
04/17/2017

Sparse Communication for Distributed Gradient Descent

We make distributed stochastic gradient descent faster by exchanging spa...
research
07/01/2020

Gradient Temporal-Difference Learning with Regularized Corrections

It is still common to use Q-learning and temporal difference (TD) learni...

Please sign up or login with your details

Forgot password? Click here to reset