Adaptive Checkpoint Adjoint Method for Gradient Estimation in Neural ODE

by   Juntang Zhuang, et al.

Neural ordinary differential equations (NODEs) have recently attracted increasing attention; however, their empirical performance on benchmark tasks (e.g. image classification) are significantly inferior to discrete-layer models. We demonstrate an explanation for their poorer performance is the inaccuracy of existing gradient estimation methods: the adjoint method has numerical errors in reverse-mode integration; the naive method directly back-propagates through ODE solvers, but suffers from a redundantly deep computation graph when searching for the optimal stepsize. We propose the Adaptive Checkpoint Adjoint (ACA) method: in automatic differentiation, ACA applies a trajectory checkpoint strategy which records the forward-mode trajectory as the reverse-mode trajectory to guarantee accuracy; ACA deletes redundant components for shallow computation graphs; and ACA supports adaptive solvers. On image classification tasks, compared with the adjoint and naive method, ACA achieves half the error rate in half the training time; NODE trained with ACA outperforms ResNet in both accuracy and test-retest reliability. On time-series modeling, ACA outperforms competing methods. Finally, in an example of the three-body problem, we show NODE with ACA can incorporate physical knowledge to achieve better accuracy. We provide the PyTorch implementation of ACA: <>.


page 1

page 2

page 3

page 4


MALI: A memory efficient and reverse accurate integrator for Neural ODEs

Neural ordinary differential equations (Neural ODEs) are a new family of...

DrMAD: Distilling Reverse-Mode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Networks

The performance of deep neural networks is well-known to be sensitive to...

PNODE: A memory-efficient neural ODE framework based on high-level adjoint differentiation

Neural ordinary differential equations (neural ODEs) have emerged as a n...

Heavy Ball Neural Ordinary Differential Equations

We propose heavy ball neural ordinary differential equations (HBNODEs), ...

Neural Ordinary Differential Equations for Nonlinear System Identification

Neural ordinary differential equations (NODE) have been recently propose...

Discretize-Optimize vs. Optimize-Discretize for Time-Series Regression and Continuous Normalizing Flows

We compare the discretize-optimize (Disc-Opt) and optimize-discretize (O...