MALI: A memory efficient and reverse accurate integrator for Neural ODEs

02/09/2021
by   Juntang Zhuang, et al.
0

Neural ordinary differential equations (Neural ODEs) are a new family of deep-learning models with continuous depth. However, the numerical estimation of the gradient in the continuous case is not well solved: existing implementations of the adjoint method suffer from inaccuracy in reverse-time trajectory, while the naive method and the adaptive checkpoint adjoint method (ACA) have a memory cost that grows with integration time. In this project, based on the asynchronous leapfrog (ALF) solver, we propose the Memory-efficient ALF Integrator (MALI), which has a constant memory cost w.r.t number of solver steps in integration similar to the adjoint method, and guarantees accuracy in reverse-time trajectory (hence accuracy in gradient estimation). We validate MALI in various tasks: on image recognition tasks, to our knowledge, MALI is the first to enable feasible training of a Neural ODE on ImageNet and outperform a well-tuned ResNet, while existing methods fail due to either heavy memory burden or inaccuracy; for time series modeling, MALI significantly outperforms the adjoint method; and for continuous generative models, MALI achieves new state-of-the-art performance.

READ FULL TEXT

page 21

page 22

research
06/03/2020

Adaptive Checkpoint Adjoint Method for Gradient Estimation in Neural ODE

Neural ordinary differential equations (NODEs) have recently attracted i...
research
06/02/2022

PNODE: A memory-efficient neural ODE framework based on high-level adjoint differentiation

Neural ordinary differential equations (neural ODEs) have emerged as a n...
research
06/19/2018

Neural Ordinary Differential Equations

We introduce a new family of deep neural network models. Instead of spec...
research
03/10/2022

Improving Neural ODEs via Knowledge Distillation

Neural Ordinary Differential Equations (Neural ODEs) construct the conti...
research
06/08/2020

Liquid Time-constant Networks

We introduce a new class of time-continuous recurrent neural network mod...
research
05/25/2023

Non-adversarial training of Neural SDEs with signature kernel scores

Neural SDEs are continuous-time generative models for sequential data. S...
research
05/27/2020

Discretize-Optimize vs. Optimize-Discretize for Time-Series Regression and Continuous Normalizing Flows

We compare the discretize-optimize (Disc-Opt) and optimize-discretize (O...

Please sign up or login with your details

Forgot password? Click here to reset