Depth-Adaptive Neural Networks from the Optimal Control viewpoint

07/05/2020
by   Joubine Aghili, et al.
0

In recent years, deep learning has been connected with optimal control as a way to define a notion of a continuous underlying learning problem. In this view, neural networks can be interpreted as a discretization of a parametric Ordinary Differential Equation which, in the limit, defines a continuous-depth neural network. The learning task then consists in finding the best ODE parameters for the problem under consideration, and their number increases with the accuracy of the time discretization. Although important steps have been taken to realize the advantages of such continuous formulations, most current learning techniques fix a discretization (i.e. the number of layers is fixed). In this work, we propose an iterative adaptive algorithm where we progressively refine the time discretization (i.e. we increase the number of layers). Provided that certain tolerances are met across the iterations, we prove that the strategy converges to the underlying continuous problem. One salient advantage of such a shallow-to-deep approach is that it helps to benefit in practice from the higher approximation properties of deep networks by mitigating over-parametrization issues. The performance of the approach is illustrated in several numerical examples.

READ FULL TEXT
research
06/18/2020

A Shooting Formulation of Deep Learning

Continuous-depth neural networks can be viewed as deep limits of discret...
research
11/01/2019

Review: Ordinary Differential Equations For Deep Learning

To better understand and improve the behavior of neural networks, a rece...
research
08/06/2020

Large-time asymptotics in deep learning

It is by now well-known that practical deep supervised learning may roug...
research
08/18/2019

Neural Dynamics on Complex Networks

We introduce a deep learning model to learn continuous-time dynamics on ...
research
03/29/2022

A Derivation of Nesterov's Accelerated Gradient Algorithm from Optimal Control Theory

Nesterov's accelerated gradient algorithm is derived from first principl...
research
08/28/2020

Control On the Manifolds Of Mappings As a Setting For Deep Learning

We use a control-theoretic setting to model the process of training (dee...
research
12/17/2022

Convergence Analysis for Training Stochastic Neural Networks via Stochastic Gradient Descent

In this paper, we carry out numerical analysis to prove convergence of a...

Please sign up or login with your details

Forgot password? Click here to reset