Automatic differentiation of ODE integration

02/06/2018
by   Johannes Willkomm, et al.
0

We discuss the calculation of the derivatives of ODE systems with the automatic differentiation tool ADiMat. Using the well-known Lotka-Volterra equations and the ode23 ODE solver as examples we show the analytic derivatives and detail how to differentiate a top-level function that calls ode23 somewhere with ADiMat. This involves the manual construction of substitution function to propagate the derivatives in forward and reverse mode. We also show how to use the reverse mode code to evaluate the Hessian in forward-over-reverse mode.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

05/20/2021

Decomposing reverse-mode automatic differentiation

We decompose reverse-mode automatic differentiation into (forward-mode) ...
04/05/2019

On the Equivalence of Forward Mode Automatic Differentiation and Symbolic Differentiation

We show that forward mode automatic differentiation and symbolic differe...
11/23/2020

Automatic differentiation of Sylvester, Lyapunov, and algebraic Riccati equations

Sylvester, Lyapunov, and algebraic Riccati equations are the bread and b...
08/25/2020

Differentiating a Tensor Language

How does one compile derivatives of tensor programs, such that the resul...
06/08/2020

AutoMat – Automatic Differentiation for Generalized Standard Materials on GPUs

We propose a universal method for the evaluation of generalized standard...
01/05/2016

DrMAD: Distilling Reverse-Mode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Networks

The performance of deep neural networks is well-known to be sensitive to...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In some use cases of automatic differentiation, is is desired to differentiate a code list which contains calls to an ODE integration routine, as for example the ode23 solver in MATLAB or GNU Octave. This function is one of a family of similar builtins, which all have the same interface, such as for example ode15s or ode45.

In ADiMat [1, 9] this will currently throw an error as the derivative of the builtin function ode23 is not yet specified. On the other hand some users have use cases where this situation occurs and hence it is desirable that a method is developed for handling such cases.

This paper is organised as follows: in Section 2 we describe the mathematical methods for computing the derivatives of the solution of an ODE w.r.t. some parameters. In Section 3 we introduce a simple ODE system that serves as an illustrating example, the well-known Lotka-Volterra equations, and show the analytic derivatives for it. In Section 4 we explain how the derivatives can be computed with ADiMat, in particular when the use code uses the ODE integration function ode23. We show how to construct working substitution functions to propagate the derivative in both forward and reverse mode, and also how support was added to ADiMat to compute the Hessian of such user code in forward-over-reverse mode. Finally in Section 5 we provide a set of conclusions.

2 Derivatives of ODE solutions

Consider a simple ODE system of the form

which describes the behaviour of time-dependent quantities given an initial state at initial time .

The most simple way to determine for times is by integrating forward from the initial state using the well-known explicit Euler method using small discrete time steps :

with for . In practice such ODE systems are routinely solved with more powerful methods such as the Runge-Kutta schemes of various orders.

Now it is often the case that the derivative of the ODE depends on a set of parameters

and thus its solution can also be seen as a function of , that is . Then let us further assume that the derivatives of w.r.t. the parameters are of interest, i.e. the quantity . The general method to obtain the derivatives of an ODE solution w.r.t. some parameters or the initial condition is described in [4], which we will restate in the following for our problem setup.

Since

we get

and by setting see that we obtain an augmented ODE system

(1)
(2)

which can be integrated in the same manner as the original system. Assuming that the initial values do not depend on the parameters, the initial values are set to zero.

When the derivatives of the solution w.r.t. the initial values are of interest we get with a very similar reasoning for the additional equation

(3)

In this case the initial values

are set to the identity matrix.

3 Example: Differentiating the Lotka-Volterra ODE system

As an example we consider the Lotka-Volterra ODE system in Subsection 3.1, derive the analytical derivatives for it in Subsection 3.2 and compute the derivatives with several different approaches, including automatice differentiation and numerial methods in Subsection 3.3.

3.1 The Lotka-Volterra ODE system

We consider a system of two ODEs describing the population numbers of two interdependent species, for example rabbits and foxes ; the well-known Lotka-Volterra equations given by

(4)
(5)

with a set of four positive real parameters [3, 6, 8].

For our example we choose the parameters as given in the following table

0.015 0.0001 0.03 0.0001

and set the initial conditions as follows:

Y(t) Y(t)
1000 20

When we integrate the ODE in the time span with the explicit Euler method with fixed time steps we obtain the result shown in Figure 1.

Figure 1: An example solution of the Lotka-Volterra equations integrated with explicit Euler.

3.2 The derivatives of the Lotka-Volterra ODE system

Now we apply the equations (1)–(3) to our example problem. The derivative is given by

(6)

the derivative is

(7)

we set the initial state to zero

and the initial state to the identity matrix

Applying the explicit Euler method to the augmented system yields the derivatives of the population numbers w.r.t. the four parameters over time, as shown in Figures 2 and 3, and the derivatives w.r.t. the initial population numbers shown in Figure 4. Note that in order to use the ode23

solver the augmented ODE system has to be cast into a single column vector, that is, the system is then solved by integrating a composite state

from the initial state using the composite derivative

(8)

where shall denote the cast to a column vector.

Figure 2: The derivatives of the Lotka-Volterra equations w.r.t. the parameters and
Figure 3: The derivatives of the Lotka-Volterra equations w.r.t. the parameters and
Figure 4: The derivatives of the Lotka-Volterra equations w.r.t. the initial population numbers

We show the function dfode in Listing 1. This function defines the composite ODE system as defined in (8). Note how the vector x, which corresponds to , is split into the three parts corresponding to , , and , and how the latter two are reshaped to the correct shapes so that the matrix expressions and can be evaluated correctly. The functions fodep, fodep_y and fodep_p are not shown, they correspond exactly to from (4)–(5) and the partial derivatives from (6) and from (7), resp.

function dyp = dfode(t, x, p)
  y      =         x(1:2);
  dydp   = reshape(x(3:10),   [2,4]);
  dydy0  = reshape(x(11:end), [2,2]);
  dy     = fodep(t, y(1:2), p);
  ddydp  = fodep_y(t, y, p) * dydp + fodep_p(t, y, p);
  ddydy0 = fodep_y(t, y, p) * dydy0;
  dyp = [
      dy(:)
      ddydp(:)
      ddydy0(:)
  ];
Listing 1: Function defining the augmented ODE system for the Lotka-Volterra equations

Obviously we could also use automatic differentiation of the MATLAB code implementing to obtain and . This approach yields exactly the same results as the analytic derivatives. Here we could of course use the ADiMat driver admDiffFor, cf. [9], but for such a small derivative being computed many times in a hot loop the overhead of this convenience driver is just too large. This driver can be made a little bit faster by setting the option nochecks. This option has the effect that neither will the source transformation be done nor the transformed file be checked for conformity. Thus, the user must first run the admDiffFor once exactly as intended to let ADiMat produce the differentiated code correctly. Then in the ODE function the option nochecks is added. However, the overall expense of the driver is still much too high for this approach to be competitive here. Instead, the user just take the time to code the appropriate manual invocations of the differentiated code. In our test this results in a reduction of the overall runtime by a factor of 10. We show the function adfode in Listing 2, including a branch to select one of the two variants.

function dyp = adfode(t, x, p)
  y      =         x(1:2);
  dydp   = reshape(x(3:10),   [2,4]);
  dydy0  = reshape(x(11:end), [2,2]);
  useDriver = false;
  if useDriver
    [J, dy] = admDiffFor(@fodep, 1, t, y, p, admOptions(’i’,[2,3],’nochecks’,1));
␣␣␣␣dfdy=J(:,1:2);
␣␣␣␣dfdp=J(:,3:end);
␣␣else
␣␣␣␣dfdy=zeros(2,2);
␣␣␣␣dfdp=zeros(2,4);
␣␣␣␣g_y=zeros(2,1);
␣␣␣␣g_p=zeros(4,1);
␣␣␣␣fork=1:2
␣␣␣␣␣␣g_y(k)=1;
␣␣␣␣␣␣[g_dy,dy]=g_fodep(t,g_y,y,g_p,p);
␣␣␣␣␣␣dfdy(:,k)=g_dy;
␣␣␣␣␣␣g_y=zeros(2,1);
␣␣␣␣end
␣␣␣␣fork=1:4
␣␣␣␣␣␣g_p(k)=1;
␣␣␣␣␣␣[g_dy,dy]=g_fodep(t,g_y,y,g_p,p);
␣␣␣␣␣␣dfdp(:,k)=g_dy;
␣␣␣␣␣␣g_p=zeros(4,1);
␣␣␣␣end
␣␣end
␣␣ddydp␣␣=dfdy*dydp+dfdp;
␣␣ddydy0=dfdy*dydy0;
␣␣dyp=[
␣␣␣␣␣␣dy(:)
␣␣␣␣␣␣ddydp(:)
␣␣␣␣␣␣ddydy0(:)
␣␣];
Listing 2: The augmented ODE function using automatic differentiation with ADiMat

Another completely different approach is to numerically differentiate the entire ODE integration, using either the finite difference approximation (FD) method [7] or the complex step (CS) method [5]. The latter method is well-known to yield accurate derivatives of real analytic computations while the typical accuracy of finite differences is at best half the machine precision. Since both and the explicit Euler method are real analytic we get the expected relative errors when comparing the four different ways to evalute these derivatives, as shown in the all-vs-all comparison shown in Table 1.

vs. AD vs. FD vs. CS
Analytic 0 1.55324 (-06) 3.3805 (-15)
AD 1.55324 (-06) 3.3805 (-15)
FD 1.55324 (-06)
Table 1: The relative error of the derivatives, computed with the four different methods, of the ODE integrated with the explicit Euler scheme, each compared against all the others

Integration of the augmented ODE system with either analytic or AD derivatives yields exactly the same result, while using the CS method to differentiate through the integration of the original ODE system yields very nearly the same result. The relative error in the divided differences is a little more than the square root of the machine precision, as expected.

3.3 Integration of the original and augmented Lotka-Volterra ODE system with the Runge-Kutta method

Things become a little more interesting when we use the more sophisticated Runge-Kutta method with adaptive timesteps as it is implemented in MATLAB or GNU Octave by the ode23 builtin function.

The first thing to note is that when integrating the augmented system the ODE integrator will generally choose different time step sizes than for the original system. Hence it will yield result vectors of different lengths, which makes it difficult to compare the different differentiation methods. This can be circumvented by prescibing a vector of time points to the integrator, instead of the tuple of start and end points , forcing it to return the solution at exactly the given points. This is in particular also required when using finite differences to approximate the derivative, because otherwise the results are completely spurious. Appearently the perturbations in the parameters are small enough to result in the same number of time steps, but the output time points themselves do change.

The results of the augmented system with either analytic or AD derivatives are visually very similar to those with the explicit Euler method, shown in the previous subsection. We show the AD derivative w.r.t. the first two parameters in Figure 5, we show the FD approximation w.r.t. the first two parameters in the Figure 6, and the CS method derivative w.r.t. the same parameters in Figure 7.

Figure 5: The results of the complex step method applied to the integration of the Lotka-Volterra equations with the ode23 solver, w.r.t. the parameters and
Figure 6: The finite difference approximation to the derivative w.r.t. the parameters and of integrating the Lotka-Volterra equations with the ode23 solver
Figure 7: The results of the complex step method applied to the integration of the Lotka-Volterra equations with the ode23 solver, w.r.t. the parameters and

However, the relative error all-vs-all relative error comparison shown in Table 2 reveals that the FD and CS methods are appearently yielding results which are substantially different from the analytic and AD derivatives in this case.

vs. AD vs. FD vs. CS
Analytic 0 0.0398952 0.0411807
AD 0.0398952 0.0411807
FD 0.0112255
Table 2: The relative error of the derivatives w.r.t. , computed with the four different methods, of the ODE integrated with the ode23 solver, each compared against all the others

The runtimes with all four methods, using explicit Euler or ode23 are given in Table 3. While using AD with the manually written invocations of the differentiated code is slightly slower than using analytic derivatives, it is about twice as fast as using the FD or CS method.

Analytic AD FD CS
Expl. Euler 0.845753 1.86251 3.04297 2.76348
ode23 0.208587 0.318961 0.635216 0.600115
Table 3: Runtimes of the augmented ODE equations with the four different methods to compute the derivatives, in seconds

4 Computing derivatives of ODE integrations with ADiMat

Although ADiMat is a relatively mature AD tool which supports a substantial set of MATLAB builtin functions, due to the sheer number of these it often occurs that a particular user code makes use of a builtin that is not yet supported. An example for this is the ode23 ODE solver which we use here to solve the Lotka-Volterra equations. In this section we want to show how to insert a manually derived derivative into a larger piece of differentiated code.

Consider as an example the function fmain shown in Listing 3. In order to facilitate this task it is most convinient to place the call to ode23 into a separate function, so as to isolate it as much as possible. For this purpose we define the function calcode, which is shown in Listing 4.

function z = fmain(y0, p, ts)
%ADiMat BMFUNC [$$1,$$2]=calcode($1,$2,$3) DIFFTO [$$@1,$$1,$$@2,$$2]=g_calcode($@1,$1,$@2,$2,$3)
  p2 = p ./ 2;
  [t1, y1] = calcode(y0, p, ts);
  [t2, y2] = calcode(y0, p2, ts);
  z = sum(y1(end,:) + y2(end,:));
Listing 3: Main function to be differentiated with ADiMat, which makes indirect use of the unsupported builtin ode23, via the sub function calcode
function [t, yt] = calcode(y0, p, ts)
  [t, yt] = eeuler(@(t, y) fodep(t, y, p), ts, y0);
Listing 4: Function calcode is used to isolate the call to the unsupported builtin ode23

Then we differentiate the function fmain as normally in forward mode. ADiMat will differentiate calcode as well and complain about the unsupported builtin ode23. As a result the generated function g_calcode will be invalid. However, by looking at the signature of g_calcode we know how a manually created replacement function should look like, as shown in Listing 5.

function [t, g_y, y] = g_calcode(y0, g_p, p, ts)
Listing 5: Signature of the invalid differentiated function for calcode generated by ADiMat

For the reverse mode, the approach is a little bit trickier. We delete the file calcode.m before launching the differentiation. This will lead ADiMat to creating a generic call to an adjoint function. Of course we restore calcode.m afterwards. For this to work we have to declare the identifier calcode to ADiMat using the BMFUNC directive [2].

Next we prepare two functions which compute the correct derivatives when called from the differentiated code of fmain. This requires a little bit of insight into how ADiMat works, so a few comments are in order. The general approach is to obtain in any conceivable way the local Jacobian or partial derivative of the function calcode. To this end we can set up the augmented ODE system (1)–(2) using either analytic or AD derivatives. In the latter case we would effectively use ADiMat in a hierarchical, two-tier fashion.

Then, in forward mode, we reshape in incoming derivative to a column vector and multiply it from the left with . Afterwards we reshape it to the same shape as the function result of calcode. This is code is placed in a file fm_calcode.m which is shown in Listing 6. Note one particularity: since g_calcode also returns the function results of calcode we can replace these by extracting the corresponding bits of the augmented ODE system. Remember however that the result vectors have different lengths when solving the augmented ODE. In this case, due what is actually done with these results in fmain or g_fmain we get way with this substitution. In other cases this may not be the case. We will see how to handle that when we consider the reverse mode.

function [t, g_y, y] = g_calcode(g_y0, y0, g_p, p, ts)
  dydp_0 = zeros(2, 4);
  dydy0_0 = eye(2);
  yp0 = [y0(:)
         dydp_0(:)
         dydy0_0(:)];
  [t, ypt] = ode23(@(t, yp) dfode(t, yp, p), ts, yp0);
  no = length(t);
  y   = ypt(:,1:2);
  Jp  = reshape(ypt(:,3:10), [2.*no, 4]);
  Jy0 = reshape(ypt(:,11:end), [2.*no, 2]);
  g_y = Jy0 * g_y0(:) + Jp * g_p(:);
  g_y = reshape(g_y, size(y));
Listing 6: FM substitution function for calcode, using the augmented ODE system with analytic derivatives to compute the derivative

In reverse mode, we reshape in incoming adjoint to a row vector and multiply it from the right with . Then we reshape the result to the same shape as the function parameter of which we require the adjoint, in this case p. This is code is placed in a file rm_calcode.m which shown in Listing 7. Here we run into the problem of the changing number of integrations steps that we mentioned. The function rm_calcode.m receives the adjoint of y, and so we are stuck with its size and need to create a Jacobian of conforming size. To this end, we have to run the original ODE system first, simply to get the vector of time steps. Of course this is only necessary when the time argument ts to the calcode function is really a time span, i.e. a vector of length two, and not a vector of time points already. Obviously the second variant is preferable.

function [a_y0 a_p] = a_calcode_110(y0, p, ts, a_y)
  if length(ts) == 2
    % timespan given: run ODE integration to get resulting time points
    [t, yt] = calcode(y0, p, ts);
  else
    t = ts;
  end
  dydp_0 = zeros(2, 4);
  dydy0_0 = eye(2);
  yp0 = [y0(:)
         dydp_0(:)
         dydy0_0(:)];
  % run augmented ODE integration with same time points as
  % non-augmented integration
  [t_alt, ypt] = ode23(@(t, yp) dfode(t, yp, p), t, yp0);
  no = length(t);
  assert(no == length(t_alt));
  y   =         ypt(:,1:2);
  Jp  = reshape(ypt(:,3:10), [2.*no, 4]);
  Jy0 = reshape(ypt(:,11:end), [2.*no, 2]);
  a_p  = a_y(:).*Jp;
␣␣a_y0=a_y(:).’*Jy0;
␣␣a_p␣␣=reshape(a_p,size(p));
␣␣a_y0=reshape(a_y0,size(y0));
Listing 7: RM substitution function for calcode, using the augmented ODE system with analytic derivatives to compute the derivative

The entire process to produce and then run a valid function g_fmain which calls the manually prepared code in Listing 6 can be expressed with the following MATLAB commands:

r_f = admTransform(@fmain,admOptions(’i’,[1,2],’mode’,’F’))
copyfile(’fm_calcode.m’,’g_calcode.m’);
J_f=admDiffFor(@fmain,1,y0,p,ts,admOptions(’i’,[1,2],’nochecks’,1))
Listing 8:

For the reverse mode, the following MATLAB commands will produce and then run a valid function a_fmain which calls the manually prepared code in Listing 7:

if exist(’calcode’)
␣␣movefile(’calcode.m’,’_calcode.m’);
␣␣clearcalcode
end
r_r=admTransform(@fmain,admOptions(’i’,[1,2],’mode’,’r’))
movefile(’_calcode.m’,’calcode.m’);
copyfile(’rm_calcode_110.m’,’a_calcode_110.m’);
J_r=admDiffRev(@fmain,1,y0,p,ts,admOptions(’i’,[1,2],’nochecks’,1))
Listing 9:

The runtimes of derivative evaluations with AD in FM and RM are compared to those with the FD and CS methods in Table 4.

FM RM FD CS
Expl. Euler 1.85121 2.39589 5.76037 5.35389
ode23 2.06182 0.571279 1.11894 0.734798
Table 4: Runtimes of the example function differentiated with ADiMat in FM, RM and with the FD and CS methods, in seconds

4.1 Evaluating the Hessian in forward-over-reverse mode

ADiMat can compute the Hessian matrix of a function using forward-over-reverse mode, through its driver function admHessian. In this case the code differentiated in reverse mode, which is exactly the same function a_fmain that we created in the previous section, is being run with arguments that are of a special class tseries2 with overloaded operators (OO) which propagate derivatives – more precisely, truncated Taylor coefficients – in forward mode. Attempting to use this to differentiate fmain will fail however, with an error being thrown by ode23. It appears it is not allowed to run the ode23 solver with any other type than single or double floats in MATLAB. This is the case even when the OO type enters the picture only via the additional parameter p used by the function handle given to the solver as the first argument, that is, when we differentiate w.r.t. to only, but not w.r.t. .

Hence it is required to add a method ode23 to the OO class and ensure that the call is dispatched to it. This would will only happen when one of the immediate arguments to the call is of the OO type. That means, one must compute the derivative w.r.t. y0. Then it is possible, albeit by means of some quite advanced MATLAB hacking, to analyse the function handle passed, extract the OO value p from its closure, or workspace, and construct a new one which, firstly, does not have an OO type in the closure and, secondly, represents the augmented ODE system which can be used to calculate the derivatives.

This derivative can then be used to propagate the derivatives to the output OO value, just as in the FM example shown before. This approach will only yield the first order derivatives, which is a limitation for the class tseries2, which in general can propagate Taylor coeffients truncated at a finite but arbitrary order. However, computing the first derivative in tseries2 is enough to evaluate the Hessian since the second order derivative is obtained by virtue of the reverse mode code.

The runtime of the Hessian evaluation with admHessian is compared to those with the second order FD method in Table 5.

FM-over-RM FD
Expl. Euler 21.398 3.89455
ode23 2.80302 7.81168
Table 5: Runtimes of the evaluation of the Hessian of example function with ADiMat and with the method, in seconds

5 Conclusion

In this report we showed several methods to calculate the derivatives of an ODE integration. In particular we showed how to use ADiMat to differentiate a larger code that calls the MATLAB ODE solver ode23 in some sub function as part of its overall computations. This works with both the forward and reverse mode of ADiMat but currently requires some manual intervention. Obviously a continuation of these results would look into discerning a method that automates the entire process. To get this done correctly, however, will propably require that ADiMat first supports lambda expressions, which is most likely still a substantial development effort.

Another result that we obtain is that both the FD and CS methods can be used to calculate the derivatives of a simple explicit Euler scheme, however both fail when attempting to differentiate the ode23 solver. The exact reason should be investigated further, since it is not usual that AD and FD derivatives deviate by more than the to-be-expected inaccuracy. However, it is possible to give examples where FD approximation can fare almost arbitrarily bad. Also, the fact that we appearently get the correct derivatives when using the simple explicit Euler scheme seems to show that we are indeed doing the correct thing when setting up the augmented ODE system.

The Hessian matrix can also be computed by means of the forward-over-reverse mode using the operated overloading Taylor propagation class in ADiMat. The implementation of the ode23 method added to the class for this purpose shows the way to go for a full support for the differentiation of ODE integration builtins with ADiMat.

References

  • [1] Bischof, C. H., Bücker, H. M., Lang, B., Rasch, A., and Vehreschild, A. Combining source transformation and operator overloading techniques to compute derivatives for MATLAB programs. In Proceedings of the Second IEEE International Workshop on Source Code Analysis and Manipulation (SCAM 2002) (Los Alamitos, CA, USA, 2002), IEEE Computer Society, pp. 65–72.
  • [2] Bischof, C. H., Bücker, H. M., and Vehreschild, A. A Macro Language for Derivative Definition. In Automatic Differentiation: Applications, Theory, and Implementations, H. M. Bücker, G. F. Corliss, P. D. Hovland, U. Naumann, and B. Norris, Eds., vol. 50 of Lecture Notes in Computational Science and Engineering. Springer, Berlin, 2005, pp. 181–188.
  • [3] Lotka, A. J. Elements of Physical Biology. Williams & Wilkins Company, Baltimore, 1925. Accessed: 2018-01-29.
  • [4] LutzL. Derivative of an ODE marching algorithm such as Euler or Runge Kutta. https://math.stackexchange.com/questions/2119506/derivative-of-an-ode-marching-algorithm-such-as-euler-or-runge-kutta, 2017. Accessed: 2018-01-28.
  • [5] Lyness, J., and Moler, C. Numerical differentiation of analytic functions. SIAM Journal on Numerical Analysis 4, 2 (1967), 202–210.
  • [6] Volterra, V. Leçons sur la Théorie Mathématique de la Lutte pour la Vie. Gauthier-Villars, 1931.
  • [7] Wikipedia contributors. Finite difference. https://en.wikipedia.org/wiki/Finite_difference, 2018. Accessed: 2018-01-31.
  • [8] Wikipedia contributors. Lotka–volterra equations. https://en.wikipedia.org/wiki/Lotka-Volterra_equations, 2018. Accessed: 2018-01-29.
  • [9] Willkomm, J., Bischof, C. H., and Bücker, H. M. A new user interface for ADiMat: Toward accurate and efficient derivatives of Matlab programs with ease of use. International Journal of Computational Science and Engineering 9, 5/6 (2014), 408–415.