An Ode to an ODE

06/19/2020
by   Krzysztof Choromanski, et al.
18

We present a new paradigm for Neural ODE algorithms, calledODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the orthogonal group O(d). This nested system of two flows, where the parameter-flow is constrained to lie on the compact manifold, provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem which is intrinsically related to training deep neural network architectures such as Neural ODEs. Consequently, it leads to better downstream models, as we show on the example of training reinforcement learning policies with evolution strategies, and in the supervised learning setting, by comparing with previous SOTA baselines. We provide strong convergence results for our proposed mechanism that are independent of the depth of the network, supporting our empirical studies. Our results show an intriguing connection between the theory of deep neural networks and the field of matrix flows on compact manifolds.

READ FULL TEXT
research
03/30/2020

Stochastic Flows and Geometric Optimization on the Orthogonal Group

We present a new class of stochastic, geometrically-driven optimization ...
research
10/06/2020

Optimizing Deep Neural Networks via Discretization of Finite-Time Convergent Flows

In this paper, we investigate in the context of deep neural networks, th...
research
04/06/2018

Structured Evolution with Compact Architectures for Scalable Policy Optimization

We present a new method of blackbox optimization via gradient approximat...
research
07/10/2019

Reinforcement Learning with Chromatic Networks

We present a new algorithm for finding compact neural networks encoding ...
research
06/15/2020

Ordering Dimensions with Nested Dropout Normalizing Flows

The latent space of normalizing flows must be of the same dimensionality...
research
07/14/2021

Towards quantifying information flows: relative entropy in deep neural networks and the renormalization group

We investigate the analogy between the renormalization group (RG) and de...
research
04/18/2020

CWY Parametrization for Scalable Learning of Orthogonal and Stiefel Matrices

In this paper we propose a new approach for optimization over orthogonal...

Please sign up or login with your details

Forgot password? Click here to reset