LocoProp: Enhancing BackProp via Local Loss Optimization

06/11/2021
by   Ehsan Amid, et al.
6

We study a local loss construction approach for optimizing neural networks. We start by motivating the problem as minimizing a squared loss between the pre-activations of each layer and a local target, plus a regularizer term on the weights. The targets are chosen so that the first gradient descent step on the local objectives recovers vanilla BackProp, while the exact solution to each problem results in a preconditioned gradient update. We improve the local loss construction by forming a Bregman divergence in each layer tailored to the transfer function which keeps the local problem convex w.r.t. the weights. The generalized local problem is again solved iteratively by taking small gradient descent steps on the weights, for which the first step recovers BackProp. We run several ablations and show that our construction consistently improves convergence, reducing the gap between first-order and second-order methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/07/2021

A Comprehensive Study on Optimization Strategies for Gradient Descent In Deep Learning

One of the most important parts of Artificial Neural Networks is minimiz...
research
07/24/2019

Sparse Optimization on Measures with Over-parameterized Gradient Descent

Minimizing a convex function of a measure with a sparsity-inducing penal...
research
09/10/2019

First Analysis of Local GD on Heterogeneous Data

We provide the first convergence analysis of local gradient descent for ...
research
05/21/2022

Symmetry Teleportation for Accelerated Optimization

Existing gradient-based optimization methods update the parameters local...
research
03/13/2020

Iterative Pre-Conditioning to Expedite the Gradient-Descent Method

Gradient-descent method is one of the most widely used and perhaps the m...
research
06/17/2020

A block coordinate descent optimizer for classification problems exploiting convexity

Second-order optimizers hold intriguing potential for deep learning, but...
research
11/04/2022

How Does Adaptive Optimization Impact Local Neural Network Geometry?

Adaptive optimization methods are well known to achieve superior converg...

Please sign up or login with your details

Forgot password? Click here to reset