In dynamical system modeling, various systems from different application domains can be represented by an autonomous system of first-order ODEs:
is a vector offixed parameters. One instance of the model is based on the values of the parameters, and also the initial conditions:
For example, in case of an N-body dynamical system the parameters are masses, and the initial conditions are positions and velocities at a certain moment of time. In practice, e.g. in planetary ephemerides, the precise values of the initial conditions are unknown, while some approximate values are determined from observations.
The task is to solve the initial value problem (IVP) (1, 2) for a range of covering all the and to minimize the discrepancy between observed and computed values. The IVP is most often solved numerically.
Let be the full set of free variables to be fit to observations by (as usually done with dynamical systems) nonlinear least squares method. The first-order derivatives are required for the method.
One way to obtain is to include it into our system of ODEs, together with itself. Accordingly, the initial conditions and the time derivative are needed to solve the IVP for the new system. While the initial conditions are trivial, the time derivative must be obtained by substituting (1):
Thus, in order to estimate the free variables, one needs to compute the derivative of the ODE’s right-hand side w.r.t.. There are three ways to perform such a computation:
Full symbolic differentiation, which requires a computer algebra system and can be quite computationally costly.
Numeric differentiation using the the finite difference technique, which is prone to truncation errors.
Automatic Differentiation (AD) is a technique of obtaining numerical values of derivatives of a given function (listing 1
). As opposite to the symbolic differentiation, AD not only reduces the computation time by using memoization techniques but also provides more flexibility as it can deal with complicated structures from programming languages, such as conditions and loops. Because of the chain rule associativity there are at least two ways (modes) of memoization: forward and reverse.
The forward mode of AD is based on the concept of dual numbers and on traversing the computational graph (fig. 1) in natural forward order. Each variable of the original program is associated with its derivative counterpart(s), which is(are) computed along with the original variable value (see listing 2). The computational complexity of forward mode is proportional to the number of independent input variables , thus it is most effective when . In our practice the number of the function output values is far greater then the number of the input ones, therefore we used forward accumulation.
In the case of reverse mode values of the derivatives are accumulated from the root(s) of the computational graph, each assignation is augmented with accumulations (see listing 3). Hence, the computation complexity of reverse mode is proportional to the number of the function outputs ; thus, it is most effective when
, which is often the case in computation of gradients of many-to-one function so widely used in the neural networks.
2 Related work and motivation
There exist a large number of forward-mode AD software tools for differentiating functions that are written in general-purpose programming languages, like Fortran (ADIFOR) bischof1992adifor , C (ADIC) bischof1997adic or C++ (ADOL-C) griewank1996algorithm . Rich features of the “host” languages, like arrays, loops, conditions, and recursion, often make it difficult to implement a practically usable AD system without imposing limitations on the language and/or extra technical work when specifying the function, especially in presence of multi-dimensional functions with many independent variables.
On the other hand, there exist a number of languages developed specially for AD tasks, like Taylor JorbaZou and VLAD Siskind2008 ; Siskind2016 . Taylor syntax, while very simple and natural, is very limited (no conditionals, loops, arrays, or subprocedures). VLAD, a functional Scheme-like language, has conditionals, loops, recursion, and subprocedures, but does not have arrays or mutability.
Finally, there are tools for differentiating functions specified as mathematical expressions in mathematical computing systems, like MATLAB (ADMAT) coleman1998admat or Mathematica (TIDES) TIDES ; TIDES2 . Such tools often require a bigger effort (as compared to a general-purpose languages) to input a practical dynamical system of large dimension with a lot of free variables.
In this work, a new language, Landau, is proposed, designed specially for dynamical systems. Other examples of such design are TIDES and Taylor. However, TIDES and Taylor obtain high-precision solutions using Taylor method and high-order derivatives, while Landau provides only first-order derivatives and is supposed to be used with numerical integrators that obtain an acceptable approximate solution (like Runge-Kutta or Adams methods), with better performance than high-precision methods.
Like VLAD, Landau is a domain-specific language designed with automatic differentiation in mind. Like TIDES and Taylor, Landau offers C code generation. Like general-purpose languages, Landau has common control flow constructs, arrays, and mutability; but unlike general-purpose languages, Landau embraces Turing incompleteness to perform static source analysis (see section 4) and generate efficient code.
Landau also has the ability to not only derive derivative dependencies from source (e.g. if , then ), but also to fix values of derivatives to other variables belonging to the dynamical system (e.g. ).
The language syntax offers functions, mutable real and integer variables, mutable real arrays, constants, if/else statements and for loops. Special type parameter is used to express Jacobian denominator variables which are not used in expressions (the right-hand sides of the assignments) itself. In case of dynamical equations differentiation such parameters could express initial conditions vectors. Special derivative operator ’ is used to annotate or assign the value of the derivative. Even with branching constructions (if/else statements) the function is guaranteed to be continuously differentiable thanks to the prohibition of the real arguments inside the condition body. Moreover, it is allowed to manually omit negligibly small derivatives using the discard keyword (e.g. if and command discard y ’ a is typed, then ).
Listing 4 demonstrates a Landau program for a dynamical system describing the motion of a spacecraft. The state of the system, i.e. the 3-dimensional position and velocity of the spacecraft, obeys Newtonian laws. The derivatives of the state w.r.t. 6 initial conditions (position and velocity) and one parameter (the gravitational parameter of the central body) are calculated using AD.
Automatic differentiation can be implemented in one of two ways: the operator overloading and the source code transformation. The first approach is based on describing the dual number data structure and overloading arithmetic operators and functions to operate on them. The second one involves analysis of function source and generation of the differentiation code. It was found tadjouddine2002ad that the latter approach generally produces more efficient derivative code. Landau is written in Racket racket and it uses source code transformation approach to produce Racket or ANSI C differentiation code.
Let lvalue be the variable in the left-hand side of assignation and rvalues be the variables in the right side. The differentiation is performed in the following way: each real lvalue is associated with an array carrying derivatives’ values. The right part of the assignment is differentiated symbolically111Even though the reverse mode is truly preferable if , which is the case in term level assignment, because there is only one output in each assignation (e.g. ), the computation overhead is negligibly small in case of small expressions., the result is carried in the accumulator array.
Each real variable assignation in a forward scheme is augmented with assignations to the derivatives’ accumulators, but in practice there is no need to compute and store all of them because some derivatives’ values are never used afterwards. That means that the computed Jacobians are often sparse.
To illustrate the sparsity problem and keep things simple let us consider an artificial migration problem over areas with a simplified diffusion model of migration:
where the initial condition vector is supposed to be determined from observational data. Say that there are regions with strongly interconnected areas whose population at an arbitrary moment of time depends from the initial conditions of other areas within that region. Following the logic from the introduction, we need to find the solution derivatives with respect to the initial conditions. The Landau program for solving that problem is presented in listing 5.
The fact that the population depends on the initial conditions only within the region makes the Jacobian sparse:
In the following simple example the sparsity pattern is presented with square blocks on the main diagonal but it could be randomly sparse in general. Accumulating the ’s derivatives in a straightforward manner will require one to compute and store values while only are needed.
There are at least two approaches to deal with sparsity. The first approach is to generate the code where each useful222We are not using term nonzero here, because it can happen that the useful Jacobian matrix element is equal to zero. Jacobian matrix element is stored in a separate variable. That involves unrolling loops to the assignation sequences and, as a result, facing the performance penalty due to the CPU cache misses. Another approach is to store useful Jacobian values in arrays and preserve the ability to use loops for traversing. The sparsity is handled by packing useful Jacobian elements to smaller arrays and generating mappings from the packed derivative indexes to the original ones and inverse mappings, which map the original indexes to the packed ones. The listing 6 demonstrates the differentiation of the loops from the lines 16–20 of the listing 5.
The compilation is performed in two stages. During the first stage the information about dependencies, used variables and derivatives is gathered for each variable or array cell. The Turing incompleteness guarantees that all loops and conditions can be unrolled and computed at compile time, thus the initial Landau function can be transformed to a list of actions (listing 7): derivative annotation, variable assignation and storage of the derivative in the output value. The list is then traversed to gather the dependency graph of the derivatives, which is used in the second compilation stage to generate mappings, inverse mappings and differentiation code.
Let be the length of parameter vector and be the number of derivatives (e.g. Jacobian row’s elements) needed for the variable. When , most Jacobian values are not used and thus should not be computed. Using the mappings technique described above we store only derivative values and use mappings and inverse mappings , where to set and get derivative values. Mappings can be easily implemented as arrays with length by storing the original indexes of the parameter vector. But it is challenging to implement effective inverse mappings, because storing them in array directly will result to the memory consumption, where is the maximum used parameter vector index. For example, even if one needs to compute derivative with respect to the last parameter index, the resulting mapping is array of size 1, but inverse mapping’s length is .
More sophisticated way to implement inverse mappings is to use minimal perfect hash functions (MPHF). A perfect hash function maps a static set of keys into a set of integer numbers without collisions, where . If , the function is called minimal. Various asymptotically effective algorithms for generating such functions exist botelho2007simple , but it is not clear if the constant factors are small enough to make the generation of many MPHFs during the single compilation practically effective. In the current version of Landau the inverse mappings are implemented as integer arrays and thus are not quite memory-efficient.
A new language called Landau has been invented to fill the niche of a domain-specific language designed for practically usable forward-mode AD for estimating the values of free parameters of a complex dynamical system.
A compiler that translates Landau code into either Racket or high-performance C code, has been implemented, making the overall procedure of estimating free variables fast and fluent.
Further work is required for more effective implementation of the inverse mappings. Such an implementation clearly should be possible thanks to Turing-incompleteness of Landau code that allows for complete static analysis.
Authors are thankful to their colleague Dan Aksim for reading drafts of this paper and to Matthew Flatt and Matthias Felleisen of the PLT Racket team for their help with the Racket programming platform.
- (1) Abad, A., Barrio, R., Marco-Buzunariz, M., Rodríguez, M.: Automatic implementation of the numerical taylor series method. Appl. Math. Comput. 268(C), 227–245 (2015). DOI 10.1016/j.amc.2015.06.042. URL https://doi.org/10.1016/j.amc.2015.06.042
- (2) Abad, A., Barrio, R., Marco-Buzunariz, M., Rodríguez, M.: Automatic implementation of the numerical taylor series method: A mathematica and sage approach. Applied Mathematics and Computation 268, 227 – 245 (2015). DOI https://doi.org/10.1016/j.amc.2015.06.042. URL http://www.sciencedirect.com/science/article/pii/S0096300315008231
- (3) Bischof, C., Carle, A., Corliss, G., Griewank, A., Hovland, P.: ADIFOR–generating derivative codes from Fortran programs. Scientific Programming 1(1), 11–29 (1992)
- (4) Bischof, C.H., Roh, L., Mauer-Oats, A.J.: ADIC: an extensible automatic differentiation tool for ANSI-C. Software: Practice and Experience 27(12), 1427–1456 (1997)
- (5) Botelho, F.C., Pagh, R., Ziviani, N.: Simple and space-efficient minimal perfect hash functions. In: Workshop on Algorithms and Data Structures, pp. 139–150. Springer (2007)
- (6) Coleman, T.F., Verma, A.: ADMAT: An automatic differentiation toolbox for MATLAB. In: Proceedings of the SIAM Workshop on Object Oriented Methods for Inter-Operable Scientific and Engineering Computing, SIAM, Philadelphia, PA, vol. 2 (1998)
- (7) Felleisen, M., Findler, R.B., Flatt, M., Krishnamurthi, S., Barzilay, E., McCarthy, J., Tobin-Hochstadt, S.: A programmable programming language. Commun. ACM 61(3), 62–71 (2018). DOI 10.1145/3127323
- (8) Griewank, A., Juedes, D., Utke, J.: Algorithm 755: ADOL-C: a package for the automatic differentiation of algorithms written in C/C++. ACM Transactions on Mathematical Software (TOMS) 22(2), 131–167 (1996)
- (9) Jorba, À., Zou, M.: A software package for the numerical integration of odes by means of high-order taylor methods. Experimental Mathematics 14(1), 99–117 (2005). DOI 10.1080/10586458.2005.10128904. URL https://doi.org/10.1080/10586458.2005.10128904
- (10) Siskind, J.M., Pearlmutter, B.A.: Nesting forward-mode ad in a functional framework. Higher-Order and Symbolic Computation 21(4), 361–376 (2008). DOI 10.1007/s10990-008-9037-1. URL https://doi.org/10.1007/s10990-008-9037-1
- (11) Siskind, J.M., Pearlmutter, B.A.: Efficient implementation of a higher-order language with built-in AD. In: AD2016 – 7th International Conference on Algorithmic Differentiation. Oxford, UK (2016)
- (12) Tadjouddine, M., Forth, S.A., Pryce, J.D.: AD tools and prospects for optimal AD in CFD flux Jacobian calculations. In: Automatic differentiation of algorithms, pp. 255–261. Springer (2002)