In some use cases of automatic differentiation, is is desired to differentiate a code list which contains calls to an ODE integration routine, as for example the ode23 solver in MATLAB or GNU Octave. This function is one of a family of similar builtins, which all have the same interface, such as for example ode15s or ode45.
In ADiMat [1, 9] this will currently throw an error as the derivative of the builtin function ode23 is not yet specified. On the other hand some users have use cases where this situation occurs and hence it is desirable that a method is developed for handling such cases.
This paper is organised as follows: in Section 2 we describe the mathematical methods for computing the derivatives of the solution of an ODE w.r.t. some parameters. In Section 3 we introduce a simple ODE system that serves as an illustrating example, the well-known Lotka-Volterra equations, and show the analytic derivatives for it. In Section 4 we explain how the derivatives can be computed with ADiMat, in particular when the use code uses the ODE integration function ode23. We show how to construct working substitution functions to propagate the derivative in both forward and reverse mode, and also how support was added to ADiMat to compute the Hessian of such user code in forward-over-reverse mode. Finally in Section 5 we provide a set of conclusions.
2 Derivatives of ODE solutions
Consider a simple ODE system of the form
which describes the behaviour of time-dependent quantities given an initial state at initial time .
The most simple way to determine for times is by integrating forward from the initial state using the well-known explicit Euler method using small discrete time steps :
with for . In practice such ODE systems are routinely solved with more powerful methods such as the Runge-Kutta schemes of various orders.
Now it is often the case that the derivative of the ODE depends on a set of parameters
and thus its solution can also be seen as a function of , that is . Then let us further assume that the derivatives of w.r.t. the parameters are of interest, i.e. the quantity . The general method to obtain the derivatives of an ODE solution w.r.t. some parameters or the initial condition is described in , which we will restate in the following for our problem setup.
and by setting see that we obtain an augmented ODE system
which can be integrated in the same manner as the original system. Assuming that the initial values do not depend on the parameters, the initial values are set to zero.
When the derivatives of the solution w.r.t. the initial values are of interest we get with a very similar reasoning for the additional equation
In this case the initial values
are set to the identity matrix.
3 Example: Differentiating the Lotka-Volterra ODE system
As an example we consider the Lotka-Volterra ODE system in Subsection 3.1, derive the analytical derivatives for it in Subsection 3.2 and compute the derivatives with several different approaches, including automatice differentiation and numerial methods in Subsection 3.3.
3.1 The Lotka-Volterra ODE system
We consider a system of two ODEs describing the population numbers of two interdependent species, for example rabbits and foxes ; the well-known Lotka-Volterra equations given by
For our example we choose the parameters as given in the following table
and set the initial conditions as follows:
When we integrate the ODE in the time span with the explicit Euler method with fixed time steps we obtain the result shown in Figure 1.
3.2 The derivatives of the Lotka-Volterra ODE system
the derivative is
we set the initial state to zero
and the initial state to the identity matrix
Applying the explicit Euler method to the augmented system yields the derivatives of the population numbers w.r.t. the four parameters over time, as shown in Figures 2 and 3, and the derivatives w.r.t. the initial population numbers shown in Figure 4. Note that in order to use the ode23
solver the augmented ODE system has to be cast into a single column vector, that is, the system is then solved by integrating a composite statefrom the initial state using the composite derivative
where shall denote the cast to a column vector.
We show the function
dfode in Listing 1. This function defines
the composite ODE system as defined in (8). Note how the
x, which corresponds to , is split into the three parts
corresponding to , , and , and how the latter two are
reshaped to the correct shapes so that the matrix expressions and can be evaluated correctly. The
fodep_p are not shown, they
correspond exactly to from (4)–(5) and the
partial derivatives from (6) and from
Obviously we could also use automatic differentiation of the MATLAB
code implementing to obtain and . This approach yields
exactly the same results as the analytic derivatives. Here we could of
course use the ADiMat driver
admDiffFor, cf. ,
but for such a small derivative being computed many times in a hot
loop the overhead of this convenience driver is just too large. This
driver can be made a little bit faster by setting the option
nochecks. This option has the effect that neither will the source
transformation be done nor the transformed file be checked for
conformity. Thus, the user must first run the
exactly as intended to let ADiMat produce the differentiated code
correctly. Then in the ODE function the option nochecks is
added. However, the overall expense of the driver is still much too
high for this approach to be competitive here. Instead, the user just
take the time to code the appropriate manual invocations of the
differentiated code. In our test this results in a reduction of the
overall runtime by a factor of 10. We show the function
Listing 2, including a branch to select one of the two
Another completely different approach is to numerically differentiate the entire ODE integration, using either the finite difference approximation (FD) method  or the complex step (CS) method . The latter method is well-known to yield accurate derivatives of real analytic computations while the typical accuracy of finite differences is at best half the machine precision. Since both and the explicit Euler method are real analytic we get the expected relative errors when comparing the four different ways to evalute these derivatives, as shown in the all-vs-all comparison shown in Table 1.
|vs. AD||vs. FD||vs. CS|
|Analytic||0||1.55324 (-06)||3.3805 (-15)|
|AD||1.55324 (-06)||3.3805 (-15)|
Integration of the augmented ODE system with either analytic or AD derivatives yields exactly the same result, while using the CS method to differentiate through the integration of the original ODE system yields very nearly the same result. The relative error in the divided differences is a little more than the square root of the machine precision, as expected.
3.3 Integration of the original and augmented Lotka-Volterra ODE system with the Runge-Kutta method
Things become a little more interesting when we use the more sophisticated Runge-Kutta method with adaptive timesteps as it is implemented in MATLAB or GNU Octave by the ode23 builtin function.
The first thing to note is that when integrating the augmented system the ODE integrator will generally choose different time step sizes than for the original system. Hence it will yield result vectors of different lengths, which makes it difficult to compare the different differentiation methods. This can be circumvented by prescibing a vector of time points to the integrator, instead of the tuple of start and end points , forcing it to return the solution at exactly the given points. This is in particular also required when using finite differences to approximate the derivative, because otherwise the results are completely spurious. Appearently the perturbations in the parameters are small enough to result in the same number of time steps, but the output time points themselves do change.
The results of the augmented system with either analytic or AD derivatives are visually very similar to those with the explicit Euler method, shown in the previous subsection. We show the AD derivative w.r.t. the first two parameters in Figure 5, we show the FD approximation w.r.t. the first two parameters in the Figure 6, and the CS method derivative w.r.t. the same parameters in Figure 7.
However, the relative error all-vs-all relative error comparison shown in Table 2 reveals that the FD and CS methods are appearently yielding results which are substantially different from the analytic and AD derivatives in this case.
|vs. AD||vs. FD||vs. CS|
The runtimes with all four methods, using explicit Euler or ode23 are given in Table 3. While using AD with the manually written invocations of the differentiated code is slightly slower than using analytic derivatives, it is about twice as fast as using the FD or CS method.
4 Computing derivatives of ODE integrations with ADiMat
Although ADiMat is a relatively mature AD tool which supports a substantial set of MATLAB builtin functions, due to the sheer number of these it often occurs that a particular user code makes use of a builtin that is not yet supported. An example for this is the ode23 ODE solver which we use here to solve the Lotka-Volterra equations. In this section we want to show how to insert a manually derived derivative into a larger piece of differentiated code.
Consider as an example the function
fmain shown in Listing
3. In order to facilitate this task it is most convinient to
place the call to ode23 into a separate function, so as to isolate
it as much as possible. For this purpose we define the function
calcode, which is shown in Listing 4.
Then we differentiate the function
fmain as normally in forward
mode. ADiMat will differentiate
calcode as well and complain about
the unsupported builtin ode23. As a result the generated function
g_calcode will be invalid. However, by looking at the signature of
g_calcode we know how a manually created replacement function should
look like, as shown in Listing 5.
For the reverse mode, the approach is a little bit trickier. We delete
calcode.m before launching the differentiation. This will
lead ADiMat to creating a generic call to an adjoint function. Of
course we restore
calcode.m afterwards. For this to work we have to
declare the identifier
calcode to ADiMat using the
Next we prepare two functions which compute the correct derivatives
when called from the differentiated code of
fmain. This requires a
little bit of insight into how ADiMat works, so a few comments are in
order. The general approach is to obtain in any conceivable way the
local Jacobian or partial derivative of the function
this end we can set up the augmented ODE system
(1)–(2) using either analytic or AD
derivatives. In the latter case we would effectively use ADiMat in a
hierarchical, two-tier fashion.
Then, in forward mode, we reshape in incoming derivative to a column
vector and multiply it from the left with . Afterwards we reshape
it to the same shape as the function result of
calcode. This is code
is placed in a file
fm_calcode.m which is shown in Listing
6. Note one particularity: since
g_calcode also returns
the function results of
calcode we can replace these by extracting
the corresponding bits of the augmented ODE system. Remember however
that the result vectors have different lengths when solving the
augmented ODE. In this case, due what is actually done with these
g_fmain we get way with this substitution. In
other cases this may not be the case. We will see how to handle that
when we consider the reverse mode.
In reverse mode, we reshape in incoming adjoint to a row vector and
multiply it from the right with . Then we reshape the result to the
same shape as the function parameter of which we require the adjoint,
in this case
p. This is code is placed in a file
which shown in Listing 7. Here we run into the problem of
the changing number of integrations steps that we mentioned. The
rm_calcode.m receives the adjoint of
y, and so we are
stuck with its size and need to create a Jacobian of conforming
size. To this end, we have to run the original ODE system first,
simply to get the vector of time steps. Of course this is only
necessary when the time argument
ts to the
calcode function is
really a time span, i.e. a vector of length two, and not a vector of
time points already. Obviously the second variant is preferable.
The entire process to produce and then run a valid function
which calls the manually prepared code in Listing 6 can be
expressed with the following MATLAB commands:
For the reverse mode, the following MATLAB commands will produce and
then run a valid function
a_fmain which calls the manually prepared
code in Listing 7:
The runtimes of derivative evaluations with AD in FM and RM are compared to those with the FD and CS methods in Table 4.
4.1 Evaluating the Hessian in forward-over-reverse mode
ADiMat can compute the Hessian matrix of a function using
forward-over-reverse mode, through its driver function
admHessian. In this case the code differentiated in reverse mode,
which is exactly the same function
a_fmain that we created in the
previous section, is being run with arguments that are of a special
tseries2 with overloaded operators (OO) which propagate
derivatives – more precisely, truncated Taylor coefficients – in
forward mode. Attempting to use this to differentiate
fail however, with an error being thrown by ode23. It appears it is
not allowed to run the ode23 solver with any other type than single
or double floats in MATLAB. This is the case even when the OO type
enters the picture only via the additional parameter
p used by the
function handle given to the solver as the first argument, that is,
when we differentiate w.r.t. to only, but not w.r.t. .
Hence it is required to add a method ode23 to the OO class and
ensure that the call is dispatched to it. This would will only happen
when one of the immediate arguments to the call is of the OO
type. That means, one must compute the derivative w.r.t.
y0. Then it
is possible, albeit by means of some quite advanced MATLAB hacking, to
analyse the function handle passed, extract the OO value
p from its
closure, or workspace, and construct a new one which, firstly, does
not have an OO type in the closure and, secondly, represents the
augmented ODE system which can be used to calculate the derivatives.
This derivative can then be used to propagate the derivatives to the
output OO value, just as in the FM example shown before. This approach
will only yield the first order derivatives, which is a limitation for
tseries2, which in general can propagate Taylor coeffients
truncated at a finite but arbitrary order. However, computing the
first derivative in
tseries2 is enough to evaluate the Hessian since
the second order derivative is obtained by virtue of the reverse mode
The runtime of the Hessian evaluation with
admHessian is compared to
those with the second order FD method in Table 5.
In this report we showed several methods to calculate the derivatives of an ODE integration. In particular we showed how to use ADiMat to differentiate a larger code that calls the MATLAB ODE solver ode23 in some sub function as part of its overall computations. This works with both the forward and reverse mode of ADiMat but currently requires some manual intervention. Obviously a continuation of these results would look into discerning a method that automates the entire process. To get this done correctly, however, will propably require that ADiMat first supports lambda expressions, which is most likely still a substantial development effort.
Another result that we obtain is that both the FD and CS methods can be used to calculate the derivatives of a simple explicit Euler scheme, however both fail when attempting to differentiate the ode23 solver. The exact reason should be investigated further, since it is not usual that AD and FD derivatives deviate by more than the to-be-expected inaccuracy. However, it is possible to give examples where FD approximation can fare almost arbitrarily bad. Also, the fact that we appearently get the correct derivatives when using the simple explicit Euler scheme seems to show that we are indeed doing the correct thing when setting up the augmented ODE system.
The Hessian matrix can also be computed by means of the
forward-over-reverse mode using the operated overloading Taylor
propagation class in ADiMat. The implementation of the
added to the class for this purpose shows the way to go for a full
support for the differentiation of ODE integration builtins with
-  Bischof, C. H., Bücker, H. M., Lang, B., Rasch, A., and Vehreschild, A. Combining source transformation and operator overloading techniques to compute derivatives for MATLAB programs. In Proceedings of the Second IEEE International Workshop on Source Code Analysis and Manipulation (SCAM 2002) (Los Alamitos, CA, USA, 2002), IEEE Computer Society, pp. 65–72.
-  Bischof, C. H., Bücker, H. M., and Vehreschild, A. A Macro Language for Derivative Definition. In Automatic Differentiation: Applications, Theory, and Implementations, H. M. Bücker, G. F. Corliss, P. D. Hovland, U. Naumann, and B. Norris, Eds., vol. 50 of Lecture Notes in Computational Science and Engineering. Springer, Berlin, 2005, pp. 181–188.
-  Lotka, A. J. Elements of Physical Biology. Williams & Wilkins Company, Baltimore, 1925. Accessed: 2018-01-29.
-  LutzL. Derivative of an ODE marching algorithm such as Euler or Runge Kutta. https://math.stackexchange.com/questions/2119506/derivative-of-an-ode-marching-algorithm-such-as-euler-or-runge-kutta, 2017. Accessed: 2018-01-28.
-  Lyness, J., and Moler, C. Numerical differentiation of analytic functions. SIAM Journal on Numerical Analysis 4, 2 (1967), 202–210.
-  Volterra, V. Leçons sur la Théorie Mathématique de la Lutte pour la Vie. Gauthier-Villars, 1931.
-  Wikipedia contributors. Finite difference. https://en.wikipedia.org/wiki/Finite_difference, 2018. Accessed: 2018-01-31.
-  Wikipedia contributors. Lotka–volterra equations. https://en.wikipedia.org/wiki/Lotka-Volterra_equations, 2018. Accessed: 2018-01-29.
-  Willkomm, J., Bischof, C. H., and Bücker, H. M. A new user interface for ADiMat: Toward accurate and efficient derivatives of Matlab programs with ease of use. International Journal of Computational Science and Engineering 9, 5/6 (2014), 408–415.