Neural Taylor Approximations: Convergence and Exploration in Rectifier Networks

11/07/2016
by   David Balduzzi, et al.
0

Modern convolutional networks, incorporating rectifiers and max-pooling, are neither smooth nor convex. Standard guarantees therefore do not apply. Nevertheless, methods from convex optimization such as gradient descent and Adam are widely used as building blocks for deep learning algorithms. This paper provides the first convergence guarantee applicable to modern convnets. The guarantee matches a lower bound for convex nonsmooth functions. The key technical tool is the neural Taylor approximation -- a straightforward application of Taylor expansions to neural networks -- and the associated Taylor loss. Experiments on a range of optimizers, layers, and tasks provide evidence that the analysis accurately captures the dynamics of neural optimization. The second half of the paper applies the Taylor approximation to isolate the main difficulty in training rectifier nets: that gradients are shattered. We investigate the hypothesis that, by exploring the space of activation configurations more thoroughly, adaptive optimizers such as RMSProp and Adam are able to converge to better solutions.

READ FULL TEXT
research
04/07/2016

Deep Online Convex Optimization with Gated Games

Methods from convex optimization are widely used as building blocks for ...
research
09/06/2015

Deep Online Convex Optimization by Putting Forecaster to Sleep

Methods from convex optimization such as accelerated gradient descent ar...
research
03/26/2021

Modeling the Nonsmoothness of Modern Neural Networks

Modern neural networks have been successful in many regression-based tas...
research
10/28/2021

Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants via the Mirror Stochastic Polyak Stepsize

We investigate the convergence of stochastic mirror descent (SMD) in rel...
research
08/19/2020

On the Approximation Lower Bound for Neural Nets with Random Weights

A random net is a shallow neural network where the hidden layer is froze...
research
03/06/2019

Why Learning of Large-Scale Neural Networks Behaves Like Convex Optimization

In this paper, we present some theoretical work to explain why simple gr...
research
09/29/2022

Restricted Strong Convexity of Deep Learning Models with Smooth Activations

We consider the problem of optimization of deep learning models with smo...

Please sign up or login with your details

Forgot password? Click here to reset