Second-Order Neural ODE Optimizer

09/29/2021
by   Guan-Horng Liu, et al.
0

We propose a novel second-order optimization framework for training the emerging deep continuous-time models, specifically the Neural Ordinary Differential Equations (Neural ODEs). Since their training already involves expensive gradient computation by solving a backward ODE, deriving efficient second-order methods becomes highly nontrivial. Nevertheless, inspired by the recent Optimal Control (OC) interpretation of training deep networks, we show that a specific continuous-time OC methodology, called Differential Programming, can be adopted to derive backward ODEs for higher-order derivatives at the same O(1) memory cost. We further explore a low-rank representation of the second-order derivatives and show that it leads to efficient preconditioned updates with the aid of Kronecker-based factorization. The resulting method converges much faster than first-order baselines in wall-clock time, and the improvement remains consistent across various applications, e.g. image classification, generative flow, and time-series prediction. Our framework also enables direct architecture optimization, such as the integration time of Neural ODEs, with second-order feedback policies, strengthening the OC perspective as a principled tool of analyzing optimization in deep learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/25/2019

Second-Order Asymptotics of the Continuous-Time Poisson Channel

The paper derives the optimal second-order coding rate for the continuou...
research
02/20/2020

Differential Dynamic Programming Neural Optimizer

Interpretation of Deep Neural Networks (DNNs) training as an optimal con...
research
06/15/2021

Scalable Second Order Optimization for Deep Learning

Optimization in machine learning, both theoretical and applied, is prese...
research
08/04/2023

Eva: A General Vectorized Approximation Framework for Second-order Optimization

Second-order optimization algorithms exhibit excellent convergence prope...
research
01/16/2021

The Connection between Discrete- and Continuous-Time Descriptions of Gaussian Continuous Processes

Learning the continuous equations of motion from discrete observations i...
research
06/12/2020

On Second Order Behaviour in Augmented Neural ODEs

Neural Ordinary Differential Equations (NODEs) are a new class of models...
research
03/25/2019

Second- and Third-Order Asymptotics of the Continuous-Time Poisson Channel

The paper derives the optimal second- and third-order coding rates for t...

Please sign up or login with your details

Forgot password? Click here to reset