Neural Integration of Continuous Dynamics

11/23/2019
by   Margaret Trautner, et al.
MIT
0

Neural dynamical systems are dynamical systems that are described at least in part by neural networks. The class of continuous-time neural dynamical systems must, however, be numerically integrated for simulation and learning. Here, we present a compact neural circuit for two common numerical integrators: the explicit fixed-step Runge-Kutta method of any order and the semi-implicit/predictor-corrector Adams-Bashforth-Moulton method. Modeled as constant-sized recurrent networks embedding a continuous neural differential equation, they achieve fully neural temporal output. Using the polynomial class of dynamical systems, we demonstrate the equivalence of neural and numerical integration.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

05/09/2012

Temporal-Difference Networks for Dynamical Systems with Continuous Observations and Actions

Temporal-difference (TD) networks are a class of predictive state repres...
12/17/2015

Synthesis of recurrent neural networks for dynamical system simulation

We review several of the most widely used techniques for training recurr...
02/12/2019

Learning interpretable continuous-time models of latent stochastic dynamical systems

We develop an approach to learn an interpretable semi-parametric model o...
10/14/2020

CAPD::DynSys: a flexible C++ toolbox for rigorous numerical analysis of dynamical systems

We present the CAPD::DynSys library for rigorous numerical analysis of d...
09/10/2019

A class of robust numerical methods for solving dynamical systems with multiple time scales

In this paper, we develop a class of robust numerical methods for solvin...
02/19/2021

Symplectic Adjoint Method for Exact Gradient of Neural ODE with Minimal Memory

A neural network model of a differential equation, namely neural ODE, ha...
08/29/2019

Runge-Kutta and Networks

We categorify the RK family of numerical integration methods (explicit a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Neural dynamical systems are dynamical systems described at least in part by neural networks. Our interest in the subject emerges in the context of Systems Dynamics and Optimization dddas (SDO), which is central to many applications such as storm prediction ravela12, climate-risk based decision support ravela10, or autonomous observatories ravela13

. The SDO cycle conceptually involves a forward path dynamically parameterizing, reducing, calibrating, initializing and simulating numerical models, and quantifying their uncertainties. SDO further involves a return path for adaptive observation, inversion and estimation. In this context, neural networks are attractive as rapidly executing surrogates, and as parametrizations of dynamics difficult numerically model, such as sub-grid-scale turbulence 

karpatne19.

Neural networks were classically developed to model discrete processes lstm

, but physical systems evolve continuously in time. Continuous-time Neural Networks (CtNNs) seek to advance neural modeling in both feed-forward or recurrent forms; the latter consist of neurons connected through a time delay and continuous-time activation

CtNN_Chen.

CtNNs can approximate the time trajectory of any -dimensional dynamical system to arbitrary accuracy CtRNN_Funahashi. Interest in them as Neural ODEs neuode has grown, particularly to describe dynamics in the form , with inputs , feed-forward/feedback components , and parameters (weights, biases) being a function of time . More common is to consider autonomous dynamics, for example , superposing a non-neural part () and a neural CtNN part (). The neural part may be modeled with time-invariant parameters, i.e. .

Variational estimation for continuous dynamical systems directly applies to training Neural-ODEs. The attendant Hamilton-Jacobi-Bellman (HJB) equations estimate CtNN and non-neural parameters alike bryson

. Although the Machine Learning community cites the use of HJB as a critical discovery 

neuode, nevertheless, HJB applies to continuous dynamics in general, and requires no special adaptation for Neural-ODEs. In practice, it is essential to consider the discrete adjoint equations whence a fundamental connection becomes even more plainly visible. For discrete networks, the adjoint equations exactly define back-propagation mlclass; bryson. They emerge as normal equations of a multi-stage (multi-layer network) two-point boundary value problem bryson for stage parameters (layer weights and biases) using terminal constraints (training data). Simply changing the definition of stages from “layers” to “discrete (time) steps,” the same methodology yields forward-backward iterations for training discretized CtNNs.

Whether CtNNs are interesting in their own right or as models of continuous processes, discrete-time simulation is presently central for training and operating them. The standard advice neuode is to use neural computation to calculate derivatives, then resort to a classical numerical integration, interleaving the two. Couldn’t a purely neural computation generate the discrete solution instead?

In this first paper of a three-part series, we show that a single recurrent Discrete-time Neural Network (DtNN) couples derivative calculation and numerical stepping to generate numerical sequences, which we define as neural integration. We construct DtNNs to implement the explicit fixed-step Runge-Kutta method press07 of arbitrary order. The key benefit of our approach is compactness; unlike extant methods RKNN; RKNN2019, the order of approximation for integration does not change the size of our network. We further apply this methodology to semi-implicit methods press07. Specifically, we couple a two-step Adams-Bashforth predictor to a two-step Adams-Moulton corrector. Once initialized, a single DtNN in each case performs neural integration, sequentially issuing outputs.

To demonstrate our approach, we define and employ PolyNets, the class of CtNNs for dynamical systems with polynomial nonlinearities. PolyNets are exact when polynomial coefficients of the governing equations are known. Exact PolyNets conveniently prove correctness of the proposed neural integration methods. We further demonstrate correspondence between neural and numerical integration using numerical simulations of the chaotic Lorenz-63 Lorenz1963 model. Extensions of methodology to non-polynomial dynamics are also discussed.

The remainder of this paper is organized as follows. Related work is discussed in Section II. PolyNets are discussed briefly in section III. Neural integration is developed in Section IV, followed by a discussion in Section V and Conclusions in Section VI.

Ii Related Work

Multiple efforts for implementing neural ordinary differential equations 

neuode are being developed, including by incorporating physical constraints during learning RackauckasDiffEq. Neural-ODEs must be discretized for numerical computation; it is typically suggested that CtNNs produce the derivatives subsequently passed to a traditional numerical integrator before returning to the neural computation as input for the next time step neuode. Our approach instead allows the CtNN to be discretized and executed entirely as a DtNN for neural integration.

Previous works have constructed numerical integrators in the form of a neural circuit RKNN; RKNN2019. However, the size of these networks scales with the order of the Runge-Kutta method. Our approach offers a fixed recurrent architecture invariant to order, which is possible through the use of multiplicative nodes, which are now fairly common, e.g. see lstm.

We devise methods for explicit Runge-Kutta (RK) of fixed time-step. We also demonstrate a semi-implicit predictor-corrector Adams-Bashforth-Moulton (ABM) method. To the best of our knowledge, the fixed-size RK neural integrator and the ABM neural integrator are new.

Our neural integration approach is easily demonstrated as correct. Each neural integration method described here is a PolyNet. If the embedded CtNN is certified to be “error-free” then so is neural integration; no algorithmic error beyond the approximation inherent in the numerical method is induced. However, CtNN certification is generally difficult because they are typically trained to an error tolerance. Except when CtNNs are exact PolyNets, the DtNN in our construction is also an exact PolyNet. Thus, algorithmic correctness of neural integration follows. This is important because differences between neural integration and numerical integration can in general arise for many reasons including training error, numerical approximation, computational implementation and numerical precision. It would be difficult to partition these errors, particularly when modeling chaotic processes.

CtNNs must in general be trained and even in the special case of PolyNets, this is true when the polynomial coefficients are unknown. Solving the discrete adjoint equations corresponding to HJB for learning requires discrete forward and backward numerical integration, which neural integration can accurately and efficaciously perform.

We posit that a key advantage of neural integration is that specialized hardware can be exploited better, particularly for Monte Carlo simulations of dynamical systems. For example, CUDA benefits of parallel simulations, e.g. for Monte Carlo, have already been shown Niemeyer2014.

Iii Exact PolyNets

To demonstrate neural integration, we introduce PolyNets, a class of CtNNs that match polynomial dynamics. Polynomial dynamics informs a broad class ODEs and PDEs (e.g. Navier Stokes). Although we do not consider PDEs in this work, the methods described here also informs their numerical integration. In Algorithm 1 that follows, we show that neural computation of polynomial dynamics with known coefficients is (trivially) exact. A fuller discussion of PolyNet classes is presented in Section V.

Let be an -dimensional variable and a dynamical system , where is a polynomial containing monomials of maximum degree . There are possible monomial terms per output and there are outputs. The function can be of the form:

(1)

where, , and is defined as

(2)

The equivalent PolyNet

(3)

is defined as a three-layer network with inputs , outputs , and one hidden layer . is constructed as follows (Algorithm 1):

  1. Set each output node’s activation to be linear.

  2. Set as the bias for the output node .

  3. Connect input node to output node with weight , where and .

  4. To map the input to the hidden layer, re-write the single term,

    (4)

    where and is the Kronecker delta function.

  5. The term is a hidden product node with linear activation and connecting to the input layer. Product nodes are commonly used in deep networks lstm. There are exactly connections between node and , each of unit weight. Note logarithms and anti-logarithms among others are obvious ways to express powers. however, we’ve avoided them to preclude invoking complex number representations.

  6. A hidden node connects to an output node additively with weight , where and .

  7. There are at most nodes in the hidden layer. Thus the PolyNet with parameters , , and at most weights of value

    constitute the parameter vector

    of .

Thus, by Algorithm 1, the dynamical system , (Equation 1) and the PolyNet neural network are equal.

iii.1 Example: Lorenz Dynamics

Figure 1: The Lorenz CtNN. Values adjacent to edges indicate weights, i.e. multiplication by that parameter along the edge. Edges without explicit labels are assumed to have weight of 1. Nodes with “” are multiplicative, nodes with “+” are additive.

In Figure 1, a neural architecture derived directly from the governing equations of the Lorenz system computes the time derivatives given an input. The equations describing the system are

Interestingly, Lorenz system is a polynomial with variables and degree (maximum) but requires no more than two hidden nodes as implemented by the PolyNet. Note also that hidden nodes of degree are technically unnecessary, and we have simply replaced them with the necessary additive connections from input to output nodes.

Iv Neural Integration

In this section, we construct neural integrators, first for the Runge-Kutta (RK) method and second to Adams-Bashforth-Moulton (ABM). For the Runge-Kutta method, consider a neural implementation of the common order explicit Runge-Kutta scheme with fixed step:

(5)
(6)
(7)
(8)
(9)

If is an external function guaranteed to be accurate/correct, then the step-update is polynomial in and , and it is an exact PolyNet. A simple design is to follow Algorithm 1, however, the network size will change with order, which we avoid using recurrence.

Figure 2: A neural circuit for an explicit fourth-order Runge Kutta method. The current state enters in the circular node labelled with a dashed border. The numbers to the upper right of each node indicate their order in the computation. This order is not unique, but it is useful to follow for the sake of comprehension. The rectangular node represents the computation of the gradient of a particular dynamical system at the point specified by the input. The black nodes indicate constant coefficients, cycle values every five iterations. This cyclic coefficient input is achieved through a sub-circuit shown in Figure 3. The dashed edges indicate a delay– they pass the start node values from the previous iteration to their end nodes, and they hold the new value until the next iteration. The diamond node represents the output. Every five iterations of the circuit produces the next element in the time series.
Figure 3: A neural circuit for the cyclical coefficient input. As an initial state, node contains the initial coefficient array for coefficient . For the Runge-Kutta implementation shown in Figure 2, these arrays are , , and for nodes , , and , respectively, and A constant one-hot vector multiplies with the coefficient array of node

to pass the first element to the black node as the coefficient for that iteration. A shift tensor multiplies with the coefficient array in node

to rotate the elements, and the shifted array is passed back for the next iteration. In this manner, the first elements of each coefficient array are cycled.

Consider instead the network depicted in Figure 2 and Figure 3. In this circuit, coefficients cyclically enter the network through the nodes on the left, and the output for the next time step is computed after five iterations of the system. In this circuit, computation occurs in the order specified by the numbers adjacent to each node. The computations at nodes , , and are then held by delays in the recurrence until the next iteration. Each node takes the sum or product of its inputs as indicated by the symbols on each node. In the circuit, represents the gradient computation achieved via any neural circuit representing a set of polynomial ODEs. In our example, the Lorenz neural circuit in Figure 1 is implemented for , but any neural dynamical system could replace it. Every iterations, node contains the output that is the next element in the time series. With this construction, given a single initial state, the neural circuit iteratively outputs the time series. The neural circuit in Figure 2 is an exact recurrent PolyNet of fixed size that implements explicit fixed-step RK4 shown in Equations 5-9.

The structure shown in Figure 2 can be modified slightly to apply to a general Runge-Kutta method. Since we consider general Runge-Methods to integrate ordinary differential equations, the equations take the form

To generalize the structure of our system, we modify node and the associated coefficient array to an array of nodes, one for each where , such that node will pass the appropriate to each node with a timed one-hot multiplicative weight. Each node will pass the to itself and node with the appropriate multiplicative coefficient such that the input to from array is always at round for . In addition, the coefficient array must change to

Finally, the coefficient array changes to the length one-hot vector with the single high bit in the final entry. With these changes, the neural architecture for the order Runge-Kutta method applies to general explicit Runge-Kutta methods. Note that the order of approximation can be varied during neural integration merely by increasing or decreasing coefficient sequences without changing the network. This provides a serious advantage in changing stiffness/dynamical regimes.

iv.1 Semi-Implicit Methods

Figure 4: A neural circuit for a two-step Adams-Bashforth predictor coupled with an Adams-Moulton Corrector.

To extend the methodology to other numerical integrators, consider the two-step Adams-Bashforth predictor, coupled to a two-step Adams- Moulton Corrector, according to

The neural architecture for this numerical integrator is shown in Figure 4, where again any equation can be integrated by the system.

(a)
(b)
(c)
(d)
Figure 5: A comparison of the neural circuit output for both the fourth order Runge-Kutta scheme and the Adams-Bashforth-Moulton corrector to the standard numerical computation for the toy example of the Lorenz system. A rendering of both the classical numerical computation output for the Lorenz system time series and the Runge-Kutta neural circuit output. A similar rendering for the Adams-Bashforth-Moulton corrector. Euclidean distance from the Runge-Kutta neural circuit output to a reference time series generated in the classical RK numerical method. For comparison, the numerical error resulting from initial perturbations of and to the classical RK numerical method are shown. Euclidean distance from the Adams-Bashforth-Moulton corrector method output to a reference time series generated in the classical ABM numerical method. For comparison, the numerical error resulting from initial perturbations of and to the classical ABM numerical method are shown.

Figure 5

depicts a comparison between a fourth-order explicit Runge-Kutta neural integration, a second order Adams-Bashforth-Moulton method neural integration and the corresponding ordinary numerical integration. The computational difference between the neural circuit method and the classical numerical computations were that the neural circuit was implemented with a TensorFlow TensorGraph, and the classical numerical computation was done with a simple function

111All programs were written in Python and executed on a Laptop.. Unsurprisingly, since the operations are mathematically equivalent, the output time series are exactly equivalent, save computational floating point discrepancies that magnify as a result of the system’s chaotic behavior. To demonstrate that the discrepancies are due to floating point computation, we have perturbed the initial condition by , which is the precision used for the computation. As a comparison, we have also generated a time series with a perturbation of to show that there is some level of randomness in the accuracy of the system following an initial perturbation.

V Discussion

Although exact PolyNets are sufficient to establish the correctness of neural integration, two issues remain. The first is computational efficiency. We posit that constructing a DtNN allows for faster and more efficient computation than interleaving neural and classical computation. Representing calculations with tensors enables computation on specialized hardware for potentially faster performance. The scaling with independently executing DtNNs, such as for uncertainty quantification and other applications, can be enormous. Indeed, for nominal ODE integration, such benefit already been shown Niemeyer2014. It remains to be seen whether the benefits apparent for many independent DtNN simulations extend to other problems such as variational Bayes. We hypothesize that the gains could be significant especially if coupled with a low-cost variable-order neural integration, as our circuits allow, is minimal.

The second issue is of training embedded CtNNs. To discuss this further, consider the network-dynamics table shown in Table 1. When polynomial structure is known but coefficients are unknown, PolyNet Identification222This term is used in the sense of System Identification in Estimation and Control. is necessary. This is accomplished using the discrete adjoint method bryson for DtNN parameter estimation. In the forward pass, the DtNN simulates outputs up to a horizon and, in the backward pass, errors are back-propagated for estimation. DtNN learning over a receding horizon relates closely to classical receding horizon model predictive optimal control. Of course, DtNNs can be trained using alternate approaches, particularly using Bayesian filters, and fixed point, fixed interval and fixed lag smoothers Ravela2007.

Network Dynamics
Poly w/ Coeff. Poly w/o Coeff. Non-poly
PolyNet Exact Estimate Approx.
Other Poor General N/A
Table 1: Neural PolyNet and general models in relation to nonlinear dynamics.

When the dynamics have non-polynomial governing equations, PolyNets are a useful approximation in the sense of Stone-Weierstrass weierstrass theorem enabling approximations of smooth non-linearities. However, in this case the PolyNet structure (number of monomials, degree, connectivity) is also in error. The challenging problem of PolyNet Structure Learning333Used analogously to Structure Learning in Machine Learning and Structural Model Errors in Geophysics. must be addressed. However, if parsimony principles are applicable to modeling, then one can gradually increase PolyNet complexity, perhaps by sampling, and apply model selection criteria. This approach is however in stark contrast to structure learning wherein a dense, large network is “made sparse,” a problem both in neural learning as well as graphical models. The systematization of model complexity may enable more tractable structure learning.

PolyNets are also, we posit, promising theory-guided learning machines karpatne19. Using theory, via governing equations to establish the initial PolyNet makes them “explainable” and avoids the ”black-box” nature typical neural construction. Subsequently, with the arrival of data, PolyNet Identification and PolyNet Structure Learning interleave, following parsimony principles titrating model complexity. In these steps, to be sure, only PolyNet terms are estimated– the integrator loop remains immutable. Thus, constructing neural circuits for dynamical systems would potentially allow for inference of the underlying equations of a system while accounting for the known physical behavior.

Theory-guided learning, as discussed, offers enormous advantage. Contrast it with the general setting wherein a polynomial is learned directly from data. Using a two-layer neural network similar to the PolyNet, bounds derived by Andoni et al. andoni are show that the problem is substantially worse. This absence of guidance for PolyNet design implies a combinatorial explosion emerging with degree. This in turn causes the number of nodes, required training data, iterations to convergence to all become prohibitive for most applications.

When the governing equations are polynomial but PolyNets aren’t used, the situation also becomes quite a bit complicated. Neural networks have been argued to be polynomial machines where hidden layers PolyReg control the effective degree. However, while neural architectures offer enormous modeling flexibility through myriad activation and learning rules, they can be very sub-optimal with respect to number of parameters needed to model a polynomial. For example, a single neuron with tanh activation poorly models 444See http://essg.mit.edu/ml/yx2; sensitivity to noise, weak generalization and no ‘extrapolation” skill are easily evident. In fact, the Taylor expansion of has no second-degree term. It may be easier to perform ”polynomial regression” instead (as PolyReg argue) provided, as discussed in the previous paragraph, the combinatorial explosion of monomials can be managed. We hypothesize that layered architectures alternating compression (by dimensionality reduction) with nonlinear stretching (by low-degree PolyNets) could be a promising alternative to classical neural circuitry or classical polynomial regression.

Regardless, given polynomial equations and the insistence on using a standard neural network architecture (e.g. feed-forward with smooth nonlinear activation), we show elsewhere that bounds on neural structure can be established ravela19. These bounds have been experimentally validated ziwei, but naturally are higher than PolyNets provide. For example, for the Lorenz system, exactly PolyNet nodes are needed, however, a feed-forward network with activation, nodes appear to be necessary. Andoni et al.’s bounds are prohibitively weaker.

Vi Conclusions

In this paper we demonstrate neural integration: solving neural ordinary differential equations as neural computation. Neural integration is carried out by Discrete-time Neural Networks (DtNN). For a provided Neural-ODE, we construct DtNNs for explicit fixed-step Runge-Kutta methods of any order. The DtNNs are recurrent PolyNets and remain of fixed size irrespective of order. We also construct DtNN for an second-order Adams-Bashforth predictor coupled to a second-order Adams-Moulton corrector. In both these cases, we show that neural integration is correct for any correct embedded Continuous-time Neural Network (CtNN). We further demonstrate this numerically using exact PolyNet for the Lorenz-63 attractor, and showing convergence between neural and numerical integration up to numerical representation. We argue that PolyNets offer a for theory-guided modeling because the availability.

Our focus in this first of a three-part paper is to simply demonstrate neural integration. The next paper focuses PolyNet identification and the third paper focuses on PolyNet adaptation. We also seek to apply this methodology to Systems Dynamics and Optimization (SDO) problems in the environment and for the design of new adaptive autopilots for environmental monitoring.

Acknowledgments

Ms. Trautner is an undergraduate researcher at ESSG advised by Dr. Ravela. Support from ONR grant N00014-19-1-2273, the MIT Environmental Solutions Initiative, the Maryanne and John Montrym Fund, and the MIT Lincoln Laboratory are gratefully acknowledged. Discussions with members of the Earth Signals and Systems group is also gratefully acknowledged.

References