Multilayer Lookahead: a Nested Version of Lookahead

10/27/2021
by   Denys Pushkin, et al.
0

In recent years, SGD and its variants have become the standard tool to train Deep Neural Networks. In this paper, we focus on the recently proposed variant Lookahead, which improves upon SGD in a wide range of applications. Following this success, we study an extension of this algorithm, the Multilayer Lookahead optimizer, which recursively wraps Lookahead around itself. We prove the convergence of Multilayer Lookahead with two layers to a stationary point of smooth non-convex functions with O(1/√(T)) rate. We also justify the improved generalization of both Lookahead over SGD, and of Multilayer Lookahead over Lookahead, by showing how they amplify the implicit regularization effect of SGD. We empirically verify our results and show that Multilayer Lookahead outperforms Lookahead on CIFAR-10 and CIFAR-100 classification tasks, and on GANs training on the MNIST dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2022

Generalization Error Bounds for Deep Neural Networks Trained by SGD

Generalization error bounds for deep neural networks trained by stochast...
research
08/15/2020

Orthogonalized SGD and Nested Architectures for Anytime Neural Networks

We propose a novel variant of SGD customized for training network archit...
research
08/13/2019

On the Convergence of AdaBound and its Connection to SGD

Adaptive gradient methods such as Adam have gained extreme popularity du...
research
03/12/2015

Training Binary Multilayer Neural Networks for Image Classification using Expectation Backpropagation

Compared to Multilayer Neural Networks with real weights, Binary Multila...
research
06/25/2022

Topology-aware Generalization of Decentralized SGD

This paper studies the algorithmic stability and generalizability of dec...
research
12/20/2019

Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks

The optimization of multilayer neural networks typically leads to a solu...
research
04/11/2020

A new multilayer network construction via Tensor learning

Multilayer networks proved to be suitable in extracting and providing de...

Please sign up or login with your details

Forgot password? Click here to reset