I Introduction
Neural dynamical systems are dynamical systems described at least in part by neural networks. Our interest in the subject emerges in the context of Systems Dynamics and Optimization dddas (SDO), which is central to many applications such as storm prediction ravela12, climaterisk based decision support ravela10, or autonomous observatories ravela13
. The SDO cycle conceptually involves a forward path dynamically parameterizing, reducing, calibrating, initializing and simulating numerical models, and quantifying their uncertainties. SDO further involves a return path for adaptive observation, inversion and estimation. In this context, neural networks are attractive as rapidly executing surrogates, and as parametrizations of dynamics difficult numerically model, such as subgridscale turbulence
karpatne19.Neural networks were classically developed to model discrete processes lstm
, but physical systems evolve continuously in time. Continuoustime Neural Networks (CtNNs) seek to advance neural modeling in both feedforward or recurrent forms; the latter consist of neurons connected through a time delay and continuoustime activation
CtNN_Chen.CtNNs can approximate the time trajectory of any dimensional dynamical system to arbitrary accuracy CtRNN_Funahashi. Interest in them as Neural ODEs neuode has grown, particularly to describe dynamics in the form , with inputs , feedforward/feedback components , and parameters (weights, biases) being a function of time . More common is to consider autonomous dynamics, for example , superposing a nonneural part () and a neural CtNN part (). The neural part may be modeled with timeinvariant parameters, i.e. .
Variational estimation for continuous dynamical systems directly applies to training NeuralODEs. The attendant HamiltonJacobiBellman (HJB) equations estimate CtNN and nonneural parameters alike bryson
. Although the Machine Learning community cites the use of HJB as a critical discovery
neuode, nevertheless, HJB applies to continuous dynamics in general, and requires no special adaptation for NeuralODEs. In practice, it is essential to consider the discrete adjoint equations whence a fundamental connection becomes even more plainly visible. For discrete networks, the adjoint equations exactly define backpropagation mlclass; bryson. They emerge as normal equations of a multistage (multilayer network) twopoint boundary value problem bryson for stage parameters (layer weights and biases) using terminal constraints (training data). Simply changing the definition of stages from “layers” to “discrete (time) steps,” the same methodology yields forwardbackward iterations for training discretized CtNNs.Whether CtNNs are interesting in their own right or as models of continuous processes, discretetime simulation is presently central for training and operating them. The standard advice neuode is to use neural computation to calculate derivatives, then resort to a classical numerical integration, interleaving the two. Couldn’t a purely neural computation generate the discrete solution instead?
In this first paper of a threepart series, we show that a single recurrent Discretetime Neural Network (DtNN) couples derivative calculation and numerical stepping to generate numerical sequences, which we define as neural integration. We construct DtNNs to implement the explicit fixedstep RungeKutta method press07 of arbitrary order. The key benefit of our approach is compactness; unlike extant methods RKNN; RKNN2019, the order of approximation for integration does not change the size of our network. We further apply this methodology to semiimplicit methods press07. Specifically, we couple a twostep AdamsBashforth predictor to a twostep AdamsMoulton corrector. Once initialized, a single DtNN in each case performs neural integration, sequentially issuing outputs.
To demonstrate our approach, we define and employ PolyNets, the class of CtNNs for dynamical systems with polynomial nonlinearities. PolyNets are exact when polynomial coefficients of the governing equations are known. Exact PolyNets conveniently prove correctness of the proposed neural integration methods. We further demonstrate correspondence between neural and numerical integration using numerical simulations of the chaotic Lorenz63 Lorenz1963 model. Extensions of methodology to nonpolynomial dynamics are also discussed.
Ii Related Work
Multiple efforts for implementing neural ordinary differential equations
neuode are being developed, including by incorporating physical constraints during learning RackauckasDiffEq. NeuralODEs must be discretized for numerical computation; it is typically suggested that CtNNs produce the derivatives subsequently passed to a traditional numerical integrator before returning to the neural computation as input for the next time step neuode. Our approach instead allows the CtNN to be discretized and executed entirely as a DtNN for neural integration.Previous works have constructed numerical integrators in the form of a neural circuit RKNN; RKNN2019. However, the size of these networks scales with the order of the RungeKutta method. Our approach offers a fixed recurrent architecture invariant to order, which is possible through the use of multiplicative nodes, which are now fairly common, e.g. see lstm.
We devise methods for explicit RungeKutta (RK) of fixed timestep. We also demonstrate a semiimplicit predictorcorrector AdamsBashforthMoulton (ABM) method. To the best of our knowledge, the fixedsize RK neural integrator and the ABM neural integrator are new.
Our neural integration approach is easily demonstrated as correct. Each neural integration method described here is a PolyNet. If the embedded CtNN is certified to be “errorfree” then so is neural integration; no algorithmic error beyond the approximation inherent in the numerical method is induced. However, CtNN certification is generally difficult because they are typically trained to an error tolerance. Except when CtNNs are exact PolyNets, the DtNN in our construction is also an exact PolyNet. Thus, algorithmic correctness of neural integration follows. This is important because differences between neural integration and numerical integration can in general arise for many reasons including training error, numerical approximation, computational implementation and numerical precision. It would be difficult to partition these errors, particularly when modeling chaotic processes.
CtNNs must in general be trained and even in the special case of PolyNets, this is true when the polynomial coefficients are unknown. Solving the discrete adjoint equations corresponding to HJB for learning requires discrete forward and backward numerical integration, which neural integration can accurately and efficaciously perform.
We posit that a key advantage of neural integration is that specialized hardware can be exploited better, particularly for Monte Carlo simulations of dynamical systems. For example, CUDA benefits of parallel simulations, e.g. for Monte Carlo, have already been shown Niemeyer2014.
Iii Exact PolyNets
To demonstrate neural integration, we introduce PolyNets, a class of CtNNs that match polynomial dynamics. Polynomial dynamics informs a broad class ODEs and PDEs (e.g. Navier Stokes). Although we do not consider PDEs in this work, the methods described here also informs their numerical integration. In Algorithm 1 that follows, we show that neural computation of polynomial dynamics with known coefficients is (trivially) exact. A fuller discussion of PolyNet classes is presented in Section V.
Let be an dimensional variable and a dynamical system , where is a polynomial containing monomials of maximum degree . There are possible monomial terms per output and there are outputs. The function can be of the form:
(1) 
where, , and is defined as
(2) 
The equivalent PolyNet
(3) 
is defined as a threelayer network with inputs , outputs , and one hidden layer . is constructed as follows (Algorithm 1):

Set each output node’s activation to be linear.

Set as the bias for the output node .

Connect input node to output node with weight , where and .

To map the input to the hidden layer, rewrite the single term,
(4) where and is the Kronecker delta function.

The term is a hidden product node with linear activation and connecting to the input layer. Product nodes are commonly used in deep networks lstm. There are exactly connections between node and , each of unit weight. Note logarithms and antilogarithms among others are obvious ways to express powers. however, we’ve avoided them to preclude invoking complex number representations.

A hidden node connects to an output node additively with weight , where and .

There are at most nodes in the hidden layer. Thus the PolyNet with parameters , , and at most weights of value
constitute the parameter vector
of .
Thus, by Algorithm 1, the dynamical system , (Equation 1) and the PolyNet neural network are equal.
iii.1 Example: Lorenz Dynamics
In Figure 1, a neural architecture derived directly from the governing equations of the Lorenz system computes the time derivatives given an input. The equations describing the system are
Interestingly, Lorenz system is a polynomial with variables and degree (maximum) but requires no more than two hidden nodes as implemented by the PolyNet. Note also that hidden nodes of degree are technically unnecessary, and we have simply replaced them with the necessary additive connections from input to output nodes.
Iv Neural Integration
In this section, we construct neural integrators, first for the RungeKutta (RK) method and second to AdamsBashforthMoulton (ABM). For the RungeKutta method, consider a neural implementation of the common order explicit RungeKutta scheme with fixed step:
(5)  
(6)  
(7)  
(8)  
(9) 
If is an external function guaranteed to be accurate/correct, then the stepupdate is polynomial in and , and it is an exact PolyNet. A simple design is to follow Algorithm 1, however, the network size will change with order, which we avoid using recurrence.
Consider instead the network depicted in Figure 2 and Figure 3. In this circuit, coefficients cyclically enter the network through the nodes on the left, and the output for the next time step is computed after five iterations of the system. In this circuit, computation occurs in the order specified by the numbers adjacent to each node. The computations at nodes , , and are then held by delays in the recurrence until the next iteration. Each node takes the sum or product of its inputs as indicated by the symbols on each node. In the circuit, represents the gradient computation achieved via any neural circuit representing a set of polynomial ODEs. In our example, the Lorenz neural circuit in Figure 1 is implemented for , but any neural dynamical system could replace it. Every iterations, node contains the output that is the next element in the time series. With this construction, given a single initial state, the neural circuit iteratively outputs the time series. The neural circuit in Figure 2 is an exact recurrent PolyNet of fixed size that implements explicit fixedstep RK4 shown in Equations 59.
The structure shown in Figure 2 can be modified slightly to apply to a general RungeKutta method. Since we consider general RungeMethods to integrate ordinary differential equations, the equations take the form
To generalize the structure of our system, we modify node and the associated coefficient array to an array of nodes, one for each where , such that node will pass the appropriate to each node with a timed onehot multiplicative weight. Each node will pass the to itself and node with the appropriate multiplicative coefficient such that the input to from array is always at round for . In addition, the coefficient array must change to
Finally, the coefficient array changes to the length onehot vector with the single high bit in the final entry. With these changes, the neural architecture for the order RungeKutta method applies to general explicit RungeKutta methods. Note that the order of approximation can be varied during neural integration merely by increasing or decreasing coefficient sequences without changing the network. This provides a serious advantage in changing stiffness/dynamical regimes.
iv.1 SemiImplicit Methods
To extend the methodology to other numerical integrators, consider the twostep AdamsBashforth predictor, coupled to a twostep Adams Moulton Corrector, according to
The neural architecture for this numerical integrator is shown in Figure 4, where again any equation can be integrated by the system.
Figure 5
depicts a comparison between a fourthorder explicit RungeKutta neural integration, a second order AdamsBashforthMoulton method neural integration and the corresponding ordinary numerical integration. The computational difference between the neural circuit method and the classical numerical computations were that the neural circuit was implemented with a TensorFlow TensorGraph, and the classical numerical computation was done with a simple function
^{1}^{1}1All programs were written in Python and executed on a Laptop.. Unsurprisingly, since the operations are mathematically equivalent, the output time series are exactly equivalent, save computational floating point discrepancies that magnify as a result of the system’s chaotic behavior. To demonstrate that the discrepancies are due to floating point computation, we have perturbed the initial condition by , which is the precision used for the computation. As a comparison, we have also generated a time series with a perturbation of to show that there is some level of randomness in the accuracy of the system following an initial perturbation.V Discussion
Although exact PolyNets are sufficient to establish the correctness of neural integration, two issues remain. The first is computational efficiency. We posit that constructing a DtNN allows for faster and more efficient computation than interleaving neural and classical computation. Representing calculations with tensors enables computation on specialized hardware for potentially faster performance. The scaling with independently executing DtNNs, such as for uncertainty quantification and other applications, can be enormous. Indeed, for nominal ODE integration, such benefit already been shown Niemeyer2014. It remains to be seen whether the benefits apparent for many independent DtNN simulations extend to other problems such as variational Bayes. We hypothesize that the gains could be significant especially if coupled with a lowcost variableorder neural integration, as our circuits allow, is minimal.
The second issue is of training embedded CtNNs. To discuss this further, consider the networkdynamics table shown in Table 1. When polynomial structure is known but coefficients are unknown, PolyNet Identification^{2}^{2}2This term is used in the sense of System Identification in Estimation and Control. is necessary. This is accomplished using the discrete adjoint method bryson for DtNN parameter estimation. In the forward pass, the DtNN simulates outputs up to a horizon and, in the backward pass, errors are backpropagated for estimation. DtNN learning over a receding horizon relates closely to classical receding horizon model predictive optimal control. Of course, DtNNs can be trained using alternate approaches, particularly using Bayesian filters, and fixed point, fixed interval and fixed lag smoothers Ravela2007.
Network  Dynamics  

Poly w/ Coeff.  Poly w/o Coeff.  Nonpoly  
PolyNet  Exact  Estimate  Approx. 
Other  Poor  General  N/A 
When the dynamics have nonpolynomial governing equations, PolyNets are a useful approximation in the sense of StoneWeierstrass weierstrass theorem enabling approximations of smooth nonlinearities. However, in this case the PolyNet structure (number of monomials, degree, connectivity) is also in error. The challenging problem of PolyNet Structure Learning^{3}^{3}3Used analogously to Structure Learning in Machine Learning and Structural Model Errors in Geophysics. must be addressed. However, if parsimony principles are applicable to modeling, then one can gradually increase PolyNet complexity, perhaps by sampling, and apply model selection criteria. This approach is however in stark contrast to structure learning wherein a dense, large network is “made sparse,” a problem both in neural learning as well as graphical models. The systematization of model complexity may enable more tractable structure learning.
PolyNets are also, we posit, promising theoryguided learning machines karpatne19. Using theory, via governing equations to establish the initial PolyNet makes them “explainable” and avoids the ”blackbox” nature typical neural construction. Subsequently, with the arrival of data, PolyNet Identification and PolyNet Structure Learning interleave, following parsimony principles titrating model complexity. In these steps, to be sure, only PolyNet terms are estimated– the integrator loop remains immutable. Thus, constructing neural circuits for dynamical systems would potentially allow for inference of the underlying equations of a system while accounting for the known physical behavior.
Theoryguided learning, as discussed, offers enormous advantage. Contrast it with the general setting wherein a polynomial is learned directly from data. Using a twolayer neural network similar to the PolyNet, bounds derived by Andoni et al. andoni are show that the problem is substantially worse. This absence of guidance for PolyNet design implies a combinatorial explosion emerging with degree. This in turn causes the number of nodes, required training data, iterations to convergence to all become prohibitive for most applications.
When the governing equations are polynomial but PolyNets aren’t used, the situation also becomes quite a bit complicated. Neural networks have been argued to be polynomial machines where hidden layers PolyReg control the effective degree. However, while neural architectures offer enormous modeling flexibility through myriad activation and learning rules, they can be very suboptimal with respect to number of parameters needed to model a polynomial. For example, a single neuron with tanh activation poorly models ^{4}^{4}4See http://essg.mit.edu/ml/yx2; sensitivity to noise, weak generalization and no ‘extrapolation” skill are easily evident. In fact, the Taylor expansion of has no seconddegree term. It may be easier to perform ”polynomial regression” instead (as PolyReg argue) provided, as discussed in the previous paragraph, the combinatorial explosion of monomials can be managed. We hypothesize that layered architectures alternating compression (by dimensionality reduction) with nonlinear stretching (by lowdegree PolyNets) could be a promising alternative to classical neural circuitry or classical polynomial regression.
Regardless, given polynomial equations and the insistence on using a standard neural network architecture (e.g. feedforward with smooth nonlinear activation), we show elsewhere that bounds on neural structure can be established ravela19. These bounds have been experimentally validated ziwei, but naturally are higher than PolyNets provide. For example, for the Lorenz system, exactly PolyNet nodes are needed, however, a feedforward network with activation, nodes appear to be necessary. Andoni et al.’s bounds are prohibitively weaker.
Vi Conclusions
In this paper we demonstrate neural integration: solving neural ordinary differential equations as neural computation. Neural integration is carried out by Discretetime Neural Networks (DtNN). For a provided NeuralODE, we construct DtNNs for explicit fixedstep RungeKutta methods of any order. The DtNNs are recurrent PolyNets and remain of fixed size irrespective of order. We also construct DtNN for an secondorder AdamsBashforth predictor coupled to a secondorder AdamsMoulton corrector. In both these cases, we show that neural integration is correct for any correct embedded Continuoustime Neural Network (CtNN). We further demonstrate this numerically using exact PolyNet for the Lorenz63 attractor, and showing convergence between neural and numerical integration up to numerical representation. We argue that PolyNets offer a for theoryguided modeling because the availability.
Our focus in this first of a threepart paper is to simply demonstrate neural integration. The next paper focuses PolyNet identification and the third paper focuses on PolyNet adaptation. We also seek to apply this methodology to Systems Dynamics and Optimization (SDO) problems in the environment and for the design of new adaptive autopilots for environmental monitoring.
Acknowledgments
Ms. Trautner is an undergraduate researcher at ESSG advised by Dr. Ravela. Support from ONR grant N000141912273, the MIT Environmental Solutions Initiative, the Maryanne and John Montrym Fund, and the MIT Lincoln Laboratory are gratefully acknowledged. Discussions with members of the Earth Signals and Systems group is also gratefully acknowledged.
Comments
There are no comments yet.