Compile-Time Symbolic Differentiation Using C++ Expression Templates

05/04/2017 ∙ by Drosos Kourounis, et al. ∙ 0

Template metaprogramming is a popular technique for implementing compile time mechanisms for numerical computing. We demonstrate how expression templates can be used for compile time symbolic differentiation of algebraic expressions in C++ computer programs. Given a positive integer N and an algebraic function of multiple variables, the compiler generates executable code for the Nth partial derivatives of the function. Compile-time simplification of the derivative expressions is achieved using recursive templates. A detailed analysis indicates that current C++ compiler technology is already sufficient for practical use of our results, and highlights a number of issues where further improvements may be desirable.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

Code Repositories

numpp

Research library for compile time optimization


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Methods employed for the solution of scientific and engineering problems often require the evaluation of first or higher-order derivatives of algebraic functions. Gradient methods for optimization, Newton’s method for the solution of nonlinear systems, numerical solution of stiff ordinary differential equations, stability analysis: these are examples of the major importance of derivative evaluation. Computing derivatives quickly and accurately improves both the efficiency and robustness of such numerical algorithms. Automatic differentiation tools are therefore increasingly available for the important programming languages; see

www.autodiff.org.

There are three well established ways to compute derivatives:

Numerical derivatives. These use finite-difference approximations [22]. They avoid the difficulty of very long exact expressions, but introduce truncation errors and this usually affects the accuracy of further computations. An intriguing alternative that in contrast to finite-difference approximations obtains the exact derivatives up to machine precision, and it is easy to implement provided that the function under consideration is analytic, is the complex-step approach [19].

Automatic differentiation (AD). This is a way to find the derivative of an expression without finding an expression for the derivative. Specifically, in a “computing environment” using AD tools, one can obtain a numerical value for by providing an expression for . The derivative computation is accurate to machine precision. A good introduction to the methods for implementing AD and the concepts underlying the method can be found in [12, 13, 10, 11, 8, 23]. Several other software packages implement AD approaches. Given a set of Fortran subroutines for evaluating a function , ADIFOR [8, 3] produces Fortran 77 subroutines for computing the first derivatives of the function. Upgrades and extensions in other high-level programming languages such as C and C++ now exist [2]

. FADBAD++ and ADOL-C are C++ libraries that combine the two basic ways (forward/backward) of applying the chain rule 

[31, 14, 15, 25, 7]. Aubert et al. implement automatic differentiation of C++ computer programs in forward mode using operator overloading and expression templates [5]. These libraries have demonstrated the ability to perform sensitivity analysis by marginally modifying the source of the computer program, replacing the double type to the type implemented in the provided library, and simply linking with the library. The implementation of the reverse mode using expression templates forms a different task because the program flow has to be reversed in this case. The implementation of AD is straightforward in the environment of the object-oriented high-level language C++ with operator overloading and expression templates [26, 6, 20].

Symbolic derivatives. These are obtained by hand or from one of the symbolic differentiation packages such as Maple, Mathematica, or Matlab. Hand-coding is increasingly difficult and error-prone as the function complexity increases. Symbolic differentiation packages can obtain expressions for the derivatives using the rules of calculus in a more or less mechanical way. Given a string describing a function, they provide exact derivatives by expressing them in terms of intermediate variables. This method provides a formula for the first derivative, which can be further differentiated if derivatives of higher order are desired. Since the formulae for the derivatives are exact, the approach does not introduce any truncation errors, unlike the other differentiation methods.

In this work we present a new way of obtaining partial derivatives of arbitrary order for multivariate functions, in a way that exhibits optimal runtime performance. This is achieved by exploiting the C++ Expression Templates mechanism described next.

2 C++ templates

Templates were introduced in C++ to allow type-safe containers. In the early days, templates served mostly as a means of generalizing software components so they could be easily reused in a variety of situations. Templates’ ability to allow generalization without sacrificing efficiency made them an integral tool of generic programming. Eventually it was discovered by Unruh [27], almost by accident, that the C++ template mechanism provides a rich facility for native language metaprogramming: the creation of programs that execute inside C++ compilers and that stop running when compilation is complete. Today, the power of templates is fully unleashed [4, 28, 6], and template metaprogramming has been extensively investigated by several authors [29, 1, 9].

Moreover, the combination of classical C++ operator overloading with template metaprogramming ideas has resulted in a very promising technique, expression templates, that has found numerous applications in scientific computing. In [17, 28] the authors explain how expression templates can be used to construct an efficient library for matrix algebra avoiding introducing runtime temporary matrix objects, with have an adverse performance and memory management effect. In contrast, the combination of expression templates and sophisticated optimisation techniques build in the C++ compilers used for the generation of the executable code, can efficiently eliminate temporary objects in many situations and thus do not suffer from any performance or memory issues inherent in the creation and destruction of temporaries. The object-oriented interface can be preserved without sacrificing efficiency. Veldhuizen [29] presents a C++ class library for scientific computing that provides performance on a par with Fortran 77/90. Advanced language features are maintained while utilization of highly sophisticated template techniques ensures no performance penalty at all. If templates are used appropriately, optimizations such as loop fusion, unrolling, tiling, and algorithm specialization can be performed automatically at compile time.

In [5], the authors use expression templates to handle automatic differentiation of multivariate function objects and apply this technique to a control flow problem. Their approach, being the first application of expression templates in the area of automatic differentiation, lacks some important features. In particular, the partial derivative of a multivariate function object provided by the user is not constructed at compile time. Instead, its value is calculated at runtime from the derivatives of all sub-expressions that comprise the main expression of the function to be differentiated. This approach is suboptimal because trivial calculations (like multiplications by one or zero) are not eliminated and performance penalties may occur, especially if the derivative has to be evaluated at a large number of points. Applications of expression templates for the efficient calculation of derivatives and Jacobians have been reported by Younis [32]. These techniques were adopted by Kourounis et al. [18] for the evaluation of the individual derivatives needed by the discrete adjoint formulation in applications involving the control and optimization of compositional flow in porous media. A recent application of expression templates can be found in [16], where a new operator-overloading method is presented that provides a compile-time representation of mathematical expressions as a computational graph that can be efficiently traversed in each direction. However, the expressions obtained this way cannot be further differentiated and the user can only expect first order derivatives.

In this paper we try to improve the ideas in [5] and [21] and to extend them in a number of ways. We show that partial derivatives of any order can be constructed upon request at compile time, as function objects themselves. Unlike the approach presented by Nehmeier [21] we enhance our approach by introducing simplification rules performed in compile time. Without claiming completeness, we demonstrate how template metaprogramming techniques could be employed to simplify the resulting expression for the partial derivatives during compilation. Trivial calculations are thus eliminated. Further algebraic simplifications, such as cancellation of common terms, are also performed where possible.

We refer to our approach by the name CoDET (Compile-time Differentiation using Expression Templates). After introducing the key concepts, we describe experiments with a number of different C++ compilers to benchmark the compile time and scalability of CoDET over large expressions, while assessing the quality of the generated code. Several examples demonstrate that the execution cost of the partial derivative constructed by CoDET is identical to that of a hand-coded version. Along the way, we identify a number of compiler-related issues and optimizations that affect these costs and suggest compiler features and further enhancements beneficial for our approach.

3 Multivariate expression definition

The purpose of this section is to expose and analyze all the classes that take part in the implementation of CoDET’s multivariate expressions.

The CoDET framework is inspired by the Expression Templates of [30]. Each expression is modeled by an expression syntax tree (EST), whose leaves are either numeric constants or independent variables, and whose internal nodes correspond to functions (unary, binary or -ary) or operators (arithmetic, logical, etc.) on the subexpressions of the corresponding subtrees. For example, the expression can be modeled by the EST in Figure 1.

exp
Figure 1: Tree for evaluating the expression .

In order to perform compile time symbolic differentiation, we use the C++ type system to encode algebraic expressions in a manner isomorphic to ESTs. A C++ class type corresponds to each node of an expression syntax tree. Leaf nodes of the EST are either variables or constants. Different variables correspond to different classes, as do different constants. Unary internal nodes of the tree correspond to analytical functions (such as , , , etc.) or to the negation operator. Binary internal nodes correspond to binary arithmetic operations.

3.1 Encoding Multivariate Expressions as Types

C++ template class instantiations are not a notationally appropriate form for denoting algebraic expressions. Thus, our framework strives to support the implicit declaration of these template instantiations by employing function and operator overloading.

For instance, assume that we are working in and we want to declare a function of three independent variables: . Using the classes Variable<int N> and Real<int L,int R,int Ex> introduced in §3.3 and §3.4 below, our framework allows us to write

Variable<0> x0;
Variable<1> x1;
Variable<2> x2;
Real<2,0,1> _2_;   // stands for 2 = 0.20e1
typedef decltype( _2_ * x2 + exp(x0 * x1) ) fType;

in which we declare class type fType to correspond to the expression , and the code inside decltype closely matches the algebraic form. Once an expression type is defined, it can be instantiated and its instances can be used to evaluate the expression:

double x[] = { 1.0, 2.5, 3.14 };
cout << f(x) << "\n"; // outputs the value of 2*3.14+exp(-2.5)

The auto keyword that is included in the C++11 standard, specifies that the type of the variable that is being declared will be automatically deduced from its initializer. The use of the auto keyword would allow simpler code; for example:

Variable<0> x0;
Variable<1> x1;
Variable<2> x2;
Real<2,0,1> _2_;
auto f =  _2_ * x2 + exp(x0 * x1);
double x[] = { 1.0, 2.5, 3.14 };
cout << f(x) << "\n"; // outputs the value of 2*3.14+exp(-2.5)

3.2 Arithmetic operators and analytical expressions

Arithmetic operators and analytical expressions are supported by class templates that are parameterized by the types of their subexpressions. Let us examine binary arithmetic operators first. Following [29], we start by introducing the descriptor (non-templatized) classes Add, Sub, Mul, Div. We list the source code only for the definition of class Add. The definitions of the remaining classes follow in a similar manner.

class Add
{
public:
    inline static double apply(double a, double b)
    { return a+b; }
};

The above classes are used to parameterize class template BinaryOp, the class that implements arithmetic binary operations between types:

template<typename L,typename R,typename Op>
class BinaryOp
{
public:
    L left_;
    R right_;
    inline double operator()(const double* x) const
    { return Op::apply(left_(x), right_(x)); }
};

To construct expressions easily, we provide templatized versions of C++ arithmetic operators. We show only the definition for operator+. Overloaded versions of the remaining operators are defined similarly.

template<typename L,typename R> inline
BinaryOp<L,R,Add> operator+(const L& rleft, const R& rright)
{ return BinaryOp<L,R,Add>(rleft, rright); }
LR BinaryOp<L,R,Add>

Analytical functions are unary operators, and they can be defined in a manner similar to the definition of binary operators. However, we have implemented them in a more direct manner in order to simplify coding. The definition of the node for function exp follows, together with an overloaded function template for easy expression construction:

template<typename T>
class MathExp
{
public:
    T expr_;
    inline double operator()(const double* x) const
    { return exp( expr_(x) ); }
};
template<typename T>
inline MathExp<T> exp(const T& rfexpr)
{ return MathExp<T>(rfexpr); }
exp(T) MathExp<T>

3.3 Variables

The CoDET framework supports an arbitrary number of independent variables. Each of these variables corresponds to an instance of class template Variable<int>.

The definition of the class template Variable is quite straightforward:

template<int varID>
class Variable
{
public:
    double operator()(const double* x) const
    { return x[varID]; }
};

3.4 Integer and real constants

CoDET provides two approaches for implementing constants, again using template instances:

The Integer approach. This exploits the class Integer<int Value>. Although the class is restricted to integer constants, its adoption leads to remarkable compilation time savings. (We would prefer to use the template class Real<double Value>, which is supported by the D language but not by C++. We strongly believe that it would share the same features as the currently available integer-only version.)

The Real approach. This combines class Real<int L,int R,int Ex>, which can represent any real number, with class Constant<typename T>, which wraps every arithmetic operation between constants and thereby allows specific floating-point optimizations. Both approaches are used in §5.1 and discussed further in §5.2. Three new classes are needed, as now described.

3.4.1 The class Integer

To support the frequent case of integer-valued constants (of type double), we provide the following class for integer constants:

template<int Value>
class Integer
{
public:
    double operator()(const double* x)
    { return double(Value); }
};

The most important feature of this class is that it allows arithmetic operations between integers to be performed during the compilation process. To be more precise, let us consider the function . Its fourth derivative will be . With appropriate simplification rules, all intermediate multiplications can be performed during compilation to give the formula . However, the range of integers that can be represented is limited to that of the integer type provided by the C++ language. (Again, we would like a class where the template parameter is of type double and not int.) We proceed by introducing a class that does not suffer from such limitations.

3.4.2 The class Real

This class uses three integer parameters to compose the value of the double precision constant represented by the class:

template<int L,int R,int Ex>
class Real
{
private:
    double value_;
public:
    Real()
    {
      std::ostringstream strout;
      strout << "0." << L << R << "e" << Ex;
      value_ = double( strtod(name.c_str(), NULL) );
    }
    inline double operator()(const double* x) const
    { return value_; }
};

The first two template parameters are integers that represent when written sequentially (LR) the decimal digits of our constant. The third template parameter Ex stands for the exponent. For example, we can write the constant 1234.56789 as Real<1234,56789,4>. Inside the default constructor this will be converted to the literal value “0.123456789e4”, and later this literal will be converted to an arithmetic value of type double and stored in the private member value_. The operator() always returns the same value, independently of the double pointer passed to it. Unfortunately, this class does not allow simplification of arithmetic operations between real constants. A work-around is presented next.

3.4.3 The class Constant

Suppose we would like to differentiate repeatedly the function . If the constant is implemented using the class Real as Real<2,3,1>, then the th derivative would require multiplications every time it is called. It would be very convenient if constants like this could be evaluated once and for all during the construction of the function object, as their values are independent of . For this purpose we introduce the following class, which is a wrapper of all such operations between constants:

template<typename T>
class Constant
{
private:
    T expr_;
    double value_;
public:
    Constant()
        : expr_(T())
        { value_ = expr_(0); }
    double operator()(const real* x) const
        { return value_; }
};

Appropriate simplification rules, performed at compile time, ensure that this class wraps every arithmetic operation between constants. In this way, during construction of the function object, the default constructor of this class calculates the constant expression represented by the type of the object expr_ and assigns its value to the private member value_ of type double. Subsequent calls to the operator() of this class will not involve any intermediate calculations such as because the result has been computed once and for all in the default constructor.

For example, to be able to wrap successive multiplications of real constants like , we need to provide a set of simplification rules using the following class Squeezer:

template<typename T>
class Squeezer
{
public:
    typedef T squeezedType;
};

This class operates recursively on its type argument in order to simplify it as much as possible, guided by appropriate simplification rules. The resulting simplified version of type T can be obtained as the nested type name squeezedType. The simplification rules are provided as explicit template specialization of the class Squeezer. In explaining the role of the rules needed for our case, we represent the class BinaryOp<A,B,Op> by AB, where may be one of the four common binary arithmetic operators .

The following rule ensures that we will always have one Constant wrapping every arithmetic operation between objects of type Constant:

template<typename A,typename B,typename Op>
class Squeezer<BinaryOp<Constant<A>,Constant<B>,Op> >
{
public:
    typedef Constant<BinaryOp<A,B,Op> > squeezedType;
};
Constant<A>Constant<B> Constant<AB>

Then we need to ensure that objects of type Constant enclosed in class BinaryOp will be visible from constants operating on that class. This is achieved for multiplication by the following rule:

template<typename A,typename B,typename C>
class Squeezer<BinaryOp<Constant<A>,BinaryOp<Constant<B>,C,Mul>,Mul> >
{
public:
    typedef BinaryOp<Constant<BinaryOp<A,B,Mul> >,
                     typename Squeezer<C>::squeezedType,
                     Mul> squeezedType;
};
Constant<A>Constant<B>C Constant<AB>C

We only need to specify the previous rule for constants appearing as left operands, because appropriate overloading of the binary operator ensures that this is always the case. The binary operator of multiplication has to be overloaded as follows:

template<typename T,int L,int R,int Ex> inline
BinaryOp<Constant<Real<L,R,Ex> >,T,Mul> operator*(const T& rleft,
                                                        const Real<L,R,Ex>& rc)
{
    typedef BinaryOp<Constant<Real<L,R,Ex> >,T,Mul> exprT;
    return exprT(Constant<Real<L,R,Ex> >(), rleft);
}
TConstant<Real<L,R,Ex>> BinaryOp<Constant<Real<L,R,Ex>>,T,Mul>
template<typename T,int L,int R,int Ex> inline
BinaryOp<Constant<Real<L,R,Ex> >,T,Mul> operator*(const Real<L,R,Ex>& rc,
                                                        const T& rright)
{
    typedef BinaryOp<Constant<Real<L,R,Ex> >,T,Mul> exprT;
    return exprT(Constant<Real<L,R,Ex> >(), rright);
}
Constant<Real<L,R,Ex>>T BinaryOp<Constant<Real<L,R,Ex>>,T,Mul>

4 Compile-time partial derivatives

We now turn our attention to partial differentiation of expressions represented as ESTs, where the partial derivatives are also represented as ESTs. For the example of §3.1, our framework computes the partial derivatives of fType as follows:

Der<0, fType>::derType df_dx0;
Der<1, fType>::derType df_dx1;
Der<2, fType>::derType df_dx2;

Here, class template Der is parameterized by two types: the type of the differentiation variable and the type of the expression to be differentiated. The derivative of the expression is then obtained as the nested type name derType. The basic technique is extensive use of specializations of class template Der, where each specialization corresponds to a particular node type of the ESTs. Differentiation then proceeds recursively down the input EST, generating the EST of the derivative.

However, a naive implementation of differentiation as above is inefficient and non-scalable because the resulting ESTs would grow prohibitively large (in the worst case exponentially) compared to the original EST, mostly because of a large number of trivial operations (addition of zero, multiplication by zero or one, etc.). If not handled well, this explosion in size of the expression trees will affect both compilation (which will become unacceptably slow and require excessive memory) and runtime, because evaluation of ESTs takes time directly proportional to their size.

Thus, it is imperative that intermediate expressions produced during differentiation be simplified algebraically. Conceptually, such simplification could be carried out as a postprocessing step on the ESTs of the derivatives. This approach would yield efficient runtime expression evaluation of the generated ESTs, but would increase the effort at compile time, reducing the scalability of our technique to expressions of only modest size.

Our approach is to perform simplifications interleaved with the differentiation steps, and to limit our simplification patterns to a carefully selected set of rules. Simplification patterns are defined by appropriate specializations of class template Squeezer, which was introduced in §3.4.3. We now outline our overall approach in more detail.

4.1 Differentiating constants and variables

We introduce two convenient constant definitions:

typedef Real<0,0,0> Zero;
typedef Real<1,0,1> One;

The differentiation rules for the leaves of ESTs, namely variables and constants, are non-recursive. The rule for differentiating constants is the simplest:

template <int N,int L,int R,int Ex>
class Der<N, Real<L,R,Ex> >
{
public:
    typedef Zero derivType;
};

A similar rule exists for classes of type Integer.

For variables, we use two specializations. The first one corresponds to the rule , whereas the second corresponds to .

template<int N>
class Der<Variable<N>,Variable<N> >
{
public:
    typename One derivType;
};
template<int N,int M>
class Der<Variable<N>,Variable<M> >
{
public:
    typename Zero derivType;
};

4.2 Differentiation of arithmetic and analytical expressions

Differentiation rules for internal nodes of ESTs are recursive. Let us consider the basic product rule and its implementation:

template<int N,typename L,typename R>
class Der<N,BinaryOp<L,R,Mul> >
{
    typedef typename Der<N,L>::derivType _dL;
    typedef typename Der<N,R>::derivType _dR;
public:
    typedef typename BinaryOp<BinaryOp<_dL,R,Mul>,
                                 BinaryOp<L,_dR,Mul>,
                                 Add> derivType;
};

To achieve simplification, differentiation rules are interlaced with the Squeezer simplification pattern. For example, let represent the simplified form of . The product rule then becomes the following:

template<int N,typename L,typename R>
class Der<N,BinaryOp<L,R,Mul> >
{
    typedef typename
        Squeezer<typename Der<N,L>::derivType>::squeezedType _dL;
    typedef typename
        Squeezer<typename Der<N,R>::derivType>::squeezedType _dR;
public:
    typedef typename
        Squeezer<BinaryOp<BinaryOp<_dL,R,Mul>,
                            BinaryOp<L,_dR,Mul>,
                            Add> >::squeezedType  derivType;
};

Similarly for the exponential, the rule and code without simplifications are as follows:

template<int N,typename F>
class Der<N,MathExp<F> >
{
    typedef typename Der<N,F>::derivType _dF;
public:
    typedef typename BinaryOp<MathExp<F>,_dF,Mul> derType;
};

However, when we wish the compiler to perform simplifications during compilation, the above rule and code have to be modified:

template<int N,typename F>
class Der<N,MathExp<F> >
{
    typedef typename
        Squeezer<typename Der<N,F>::derivType>::squeezedType _dF;
public:
    typedef typename
        Squeezer<BinaryOp<MathExp<F>,_dF,Mul> >::squeezedType derType;
};

Other arithmetic operators and analytic functions are handled in the same spirit.

4.3 Expression simplifications

To appreciate the effect of simplification on the size of the derivative EST of an expression, consider the partial derivative with respect to of (computed by the product and exponential rules). Without simplifications, one would get

instead of the relatively simple expression (an EST of 21 nodes instead of just 4). Evaluation of the unsimplified formula could have much higher runtime than for the simplified one. Additionally, the compile time differentiation of the original formula would flood the compiler’s symbol tables with a plethora of trivial types, increasing compilation time and memory use.

Algebraic simplification is an old and broadly studied subject of symbolic computation. Simplification rewrites a given EST as a new EST that is in some sense simpler. Several projects on template metaprogramming include expression simplifiers; e.g., Schupp et al. [24] describe a user extensible simplification framework for expression templates over abstract data types.

The rules of a simplifier must be chosen carefully; a limited set of rules might miss significant simplification opportunities, but a very extensive set might introduce significant compilation overhead and result in dubious simplicity—for example, which expression is “simpler”: or ? We have tested and propose the rules shown in Table 1.

Table 1: Simplifier rewrite rules. represent arbitrary formulae; represent integer-valued constants.

Note that a single rewrite rule may need to be implemented by several template specializations. The following rules simplify addition of zero to some variable or expression. The first template specialization implements the rule while the second implements . The first two specializations cater to the commutativity of addition. The last specialization is needed by the compiler to resolve the ambiguity between the first two when both template parameters are objects of type Zero. Simplification rules for multiplication by zero or one are defined in a similar way.

template<typename T>
class Squeezer<BinaryOp<T,Zero,Add> >
{
public:
    typedef T squeezedType;
};
template<typename T>
class Squeezer<BinaryOp<Zero,T,Add> >
{
public:
    typedef T squeezedType;
};
template<>
class Squeezer<BinaryOp<Zero,Zero,Add> >
{
public:
    typedef Zero squeezedType;
};

In studying Table 1, one might notice certain discrepancies. For example, there is a rule but not the equivalent addition rule . The reason is that while the first rule applies to a number of expressions the user is likely to write (e.g., differentiating the expression tree with respect to ), the second would apply in unlikely formulae only. In our design we have chosen to keep a rather minimal set of simplifiers, as our purpose is not to develop a complete compile time symbolic differentiation package but rather to illustrate the idea and motivate further developments.

4.4 Higher-order derivatives

A straightforward way to obtain higher-order partial derivatives is by sequential differentiation. As a form of programming convenience, we provide a recursive class template, DerN, for obtaining the th derivative of an expression with respect to variable :

template<int N,int M,typename F>
class DerN
{
public:
    typedef typename
      DerN<N-1,M,typename Der<M,F>::derivType>::derivType derivType;
};

The following specialization is needed to end the recursion:

template<int N,typename F>
class DerN<1,M,F>
{
public:
    typedef typename Der<M,F>::derType derType;
};
// example:  f(x):=  d^2(exp(x*x)) / dx^2
Variable<0> x;
typedef DerN<2,0,decltype(exp(x*x))>::derivType f;
Remark 1.

Our methodology allows evaluation of any partial derivative of arbitrarily high order. For expressions with derivative formulas that remain bounded independently of the differentiation order, and provided that the set of simplification rules implemented can handle every possible case, our approach generates a template expression with number of terms also bounded independently of the differentiation order. In the general case, however, the highest order of differentiation may be restricted by several factors. The time needed for the differentiation grows linearly with the number of terms that are inlined. For problems where the formulas of the partial derivatives grow exponentially in size with respect to the differentiation order, the same is observed with compilation time and memory.

5 Empirical evaluation

The effectiveness of the CoDET approach is investigated in this section through several test cases. To verify that our results are independent of CPU architectures we used the following platforms running the same 64-bit Linux distribution:

  1. Intel Xeon CPU E5-2670
    2.60 GHz, 20480 KB L3 cache

All test cases benchmarked four different C++ compilers. The compilers and their compilation flags follow:

  1. GNU (GCC) 4.8.2
    g++-4.8.2 -static -O3

  2. Intel icpc (ICC) 14.0.0 20130728
    icpc -O3

  3. Sun 5.9 Linux_i386 Patch 124865-01 2007/07/30
    CC -xO5 -features=extensions -m64

  4. Portland Group pgCC 14.4-0 64-bit
    pgCC -O3 –gnu

For each test we present both compilation time (for the compilers to generate the executable file) and runtime performance of the benchmarked function f(x) for executing the following loop:

double sum,  x[1];
x[0] = 0.0;
sum = 0.0;
for (i = 0; i < numberOfIterations; i++)
{
    x[0] -= 0.1;
    sum += f(x);
}

In all examples we set . The loop overhead varies from to milliseconds at most and is subtracted from the running time of the above loop. Thus, the runtimes measured here essentially correspond to the total time needed for the function calls.

The code that instructs the compiler to generate the th derivative for the simple univariate function is as follows:

Variable<0> X;
typedef decltype(
           exp( Constant<1>()*X )
         + exp( Constant<2>()*X )
         + exp( Constant<3>()*X ) ) fType;
// our functional f = e^x + e^(2x) + e^(3x)
fType f = exp( Constant<1>()*X )
         + exp( Constant<2>()*X )
         + exp( Constant<3>()*X );
// and its Nth derivative
const int N = 1;
typedef DerN<N,0,fType>::derType dfType;
dfType dNf_dxN;

By changing the value of the constant variable we obtain the desired order of the derivative. The formula for changes according to the benchmarked case. Note that the support of the C++ type auto, currently provided by the C++11 standard, allows us simpler code:

Variable<0> X;
auto f = exp( Constant<1>()*X )
        + exp( Constant<2>()*X )
        + exp( Constant<3>()*X );
const int N = 1;
auto dNf_dxN = derivative<N>(f);

where the definition for the function derivative() follows:

template<int N, typename T> auto derivative(T f)
{
  typedef typename DerN<N, 0, T>::derType dfType;
  return dfType();
}

In the same spirit one can introduce a similar function taking two integer template arguments to allow simpler code for partial derivatives.

5.1 The need for compile-time simplifications

Simplifications are an essential ingredient of every symbolic package. In the present study, compilers without them would not be able to cope with higher-order derivatives, as intermediate expressions would grow exponentially and the generated code would perform poorly. This is demonstrated in what follows.

5.1.1 Simplifications disabled

Consider again the function . With simplifications disabled in our code, we examine the compile time for each compiler and the runtime of the compiler-generated derivatives up to order six. Constants were implemented by the Integer approach of §3.4.1. The results obtained using the Real approach are similar and do not provide further insights.

Figure 2: Compilation time needed by each compiler to generate the th derivative of the function when no simplifications are performed during compilation. Runtime needed by each compiler-generated th derivative of the function for function calls.

At the left of Figure 2, we plot the compile time for each compiler, and at the right the runtime for calls to the compiler-generated derivatives. We see that all compilers except GNU show rapidly increasing compilation times and were not able to compile code for a derivative of order five in reasonable time. More precisely, the compilations with Sun compiler required 922 seconds to compile the same code where GNU required only 3 seconds. On the other had we observe in Figure 2 right, that the corresponding runtimes increase exponentially after a fourth-order derivative.

5.1.2 Simplifications enabled

We ran the same benchmark with simplifications enabled in our code. The simplification code produces the same expression for the th derivative as a hand coded version of derivative. More precicely the derivative expression computed during compilation obtains the simplified form . The results are depicted in Figure 3. Compilation time and runtime of the generated derivatives remain practically constant and independent of the differentiation order. We were able to obtain derivatives up to order 15 before the integer constant coefficient () of some of the exponential terms overflowed. The runtime here as well corresponds to function calls.

Figure 3: Compilation time at the left needed by each compiler to generate th derivatives of when simplifications are performed during compilation. The run time for function calls of the compiler-generated th derivative of the function is shown at the right.

5.2 Beyond template integer constants

As we discussed in §3.4.1, the Integer approach is not the optimal way to implement constants, as it can only represent integer numbers. It was used to illustrate the convenience and benefits with respect to compile time simplifications that result by its adoption. The class Real<double Value>, currently not supported by the C++ language standard, would share the flexibility of the class Integer<int Value> and would provide the ideal way for implementing constants in our framework.

Figure 4: Compilation time at the left, for obtaining the th derivative of when simplifications are performed during compilation with real constants implemented by the Real approach. Runtime for function calls at the right, of the compiler-generated th derivatives.

Here we benchmark the only alternative way of implementing constants, introduced in §3.4.2 and §3.4.3. Using the Real approach and with simplifications enabled we run the same benchmark as before, but this time we can evaluate derivatives of much higher order than 15 because integer overflow is not an issue. Since the number of objects that are generated with each new derivative increases linearly, we expect the compilation time to increase linearly as well. Furthermore, the runtime should remain constant because all the intermediate arithmetic operations between real constants are wrapped by the class Constant<typename T> and calculated upon construction of the object representing the derivative. Thus, no redundant arithmetic operations are performed when the derivative is evaluated. This is exactly what we observe in Figure 4.

5.3 Scalability for long formulae

Having established the benefits as well as the convenience and flexibility that accompany compile time simplifications in both code quality and runtime performance, we now study the scalability of our approach for formulae consisting of many terms, with simplifications turned on in our code. For this purpose we consider the following function of one variable:

(1)

for which our limited set of simplification rules works as intended. By increasing the upper limit of the sum in (1), we obtain longer and longer expressions. As before, we examine both approaches of implementing constants. The compilation time needed for obtaining the first derivative of (1) with both approaches is depicted in Figure 5.

Figure 5: Compilation time for generating the 1st derivative of , with simplifications performed during compilation using the Integer approach for constants (left) and the Real approach (right).
Figure 6: Runtime of the generated 1st derivative of , with simplifications performed during compilation using the Integer approach for constants (left) and the Real approach (right), for function calls.

The compilation time is depicted in Figure 5 with constants implemented by the Integer approach at the left and with the real Real at the right. In Figure 6 we show the corresponding runtimes of the compiler-generated derivatives. We observe that the compilation time scales linearly for all compilers with the exception of Sun, for which it increases exponentially. Regarding the runtime performance, we see that for all compilers the runtime increases also linearly. Moreoever it is the same as the hand-coded one. This striking feature of our approach suggests that as soon as simplification rules are mature enough to handle all possible cases, then partial derivatives of any order obtained at compile-time would perform as fast as the hand-coded ones.

6 Comparison with other AD approaches

It is of interest to compare the performance of the CoDET approach against other popular AD packages like FADBAD++ [25, 7] and both the tape-based and tapeless methods provided by ADOL-C [31, 15, 14]. FADBAD++ uses C++ expression templates while ADOL-C provides a library to which the user should link after modifying appropriately his code. The tapeless approach provided by ADOL-C is much more efficient than the tape-based one, but it can only obtain first partial derivatives. The GNU compiler was used throughout.

6.1 Univariate functions

Our first benchmark considers the first derivative of in (1) for . The runtime for function calls as a function of is plotted in Figure 7 (left). We clearly see that our approach and the tapeless ADOL-C approach are as fast as the hand-coded derivative, and even a bit faster for large . The remaining approaches are significantly slower.

At the right of Figure 7 we compare the runtime of the th derivative of . The CoDET runtime is constant and independent of the differentiation order , while the tape-based ADOL-C needs increasingly more time as grows. Thus our approach performs from 200 to 800 times faster for to 100.

Figure 7: Runtime of the compiler-generated first derivative of the function obtained by our CoDET Real approach versus the one obtained by FADBAD++ and both tape based and tapeless ADOL-C (left). Runtime of the th derivatives of obtained by the CoDET Real approach versus tape-based ADOL-C (right). All runtimes correspond to function calls.

6.2 Multivariate functions

We now test the performance of CoDET with the two multivariate functions

(2)

In both cases we evaluate all partial derivatives of and . The runtimes for function calls are listed in Table 2. Once again we see that our approach is as efficient as derivatives coded by hand as ordinary inline C++ functions. All other approaches lag behind in performance. The fastest alternative (tapeless ADOL-C) is two times slower for the first case and demonstrates similar performance with hand coded derivatives for all except the first partial derivative for which it performs 23 times slower than CoDET.

Hand-coded CoDET FADBAD++ Tape ADOL-C Tapeless ADOL-C 00.42 00.42 3.54 154.85 0.78 00.45 00.45 11.45 166.41 0.83 00.45 00.45 11.64 159.73 0.8 00.41 00.41 3.43 159.97 0.8 0.01 0.01 1.12 156.62 0.23 0.51 0.39 5.52 156.51 0.58 0.59 0.46 3.97 159.71 0.61 0.59 0.46 3.97 159.08 0.61

Table 2: Runtime of the generated partial derivatives of the multivariate functions and in equation (2) using several AD approaches: CoDET, FADBAD++, tape-based ADOL-C, and tapeless ADOL-C. The runtimes correspond to function calls.

We understand that the performance depends on the specific function being benchmarked. However, provided our library is augmented with sophisticated simplification compile time rules to handle every possible case (which is not our purpose here), our approach should always produce derivatives as efficient as hand-coded derivatives. The C++ code needed for encoding as a template expression and generating its first derivative using the auto keyword follows:

Variable<0> X0; Variable<1> X1;
Variable<2> X2; Variable<3> X3;
auto f = ( X0*tan(X1*X2) )/( tan(X1*X2)-X3 );
const int n = 1;
const int m = 0;
auto dnf_dx_mn = f<n, m>.derivative();

Our final benchmark repeats the first computational experiment presented by Nehmeier [21]. Since the results presented in [21] have been obtained on a different system with an older compiler, we normalise the computation of gradients obtained from several different libraries with the runtimes of the hand-coded gradient. We provide only the actual running time for the hand-coded gradient in milliseconds corresponding to function calls. The results are presented in table 3. The final row at the table shows the ratio of the running times of the compiler generated gradient using the approach introduced by Nehmeier [21] with the hand coded versions that are reported there. As expected the CoDET approach has identical running times with optimised hand-coded derivatives, unlike other competitors. Although the approach introduced in [21] is similar to CoDET, it lacks the simplification mechanism that is exploited by CoDET to avoid redundant computations.

Run
Hand-coded (ms) 118 27 27
CoDET 1 1 1
FADBAD++ 65.7 231.9 284.4
ADOL-C 311 1300 1466.7
ADOL-C reuse tape 168.6 692.6 744.4
Sacado DFad 8.3 27.7 39.2
Sacado SFad 3 1.8 2.4
Nehmeier 1.1 1.3 1.8
Table 3: Performance comparison of the gradient computation. Numbers are normalised with the runtime of hand-coded gradients, measured in milliseconds, corresponding to functions calls.

7 Conclusions

In our development of CoDET we have demonstrated how C++ expression templates and template metaprogramming techniques can be employed to allow C++ compilers to generate partial derivatives of multivariate functions of any order during the compilation process. We verified that compile time simplifications of the resulting formulas for the derivatives must be interleaved with the differentiation steps in order to speed up compilation. For some cases, the implementation of compile time simplification rules resulted in a reduction of compilation time by up to three orders of magnitude for specific compilers, while the runtime of the generated derivatives was reduced by up to two orders of magnitude.

A striking feature of our approach, apart from the arbitrarily high order of derivatives that can be obtained, is that the compiler-generated derivatives are as efficient as hand-coded ones, provided a complete set of simplification rules is implemented.

The template metaprogramming techniques presented and benchmarked here revealed that several C++ compilers are already mature enough for compile time symbolic differentiation. The same techniques could also be used to implement symbolic integration at compile time. We hope that this work will motivate further developments by compiler vendors with the aim of supporting complete symbolic compile time differentiation in C++ and other high-level languages.

References

  • [1] D. Abrahams and A. Gurtovoy. C++ Template Metaprogramming (Concepts, Tools, and Techniques from Boost and Beyond). Addison-Wesley, 2005.
  • [2] ADIC. http://www-fp.mcs.anl.gov/ADIC.
  • [3] ADIFOR. http://www-unix.mcs.anl.gov/autodiff/ADIFOR.
  • [4] A. Alexandrescu. Modern C++ Design (Generic programming and Design Patterns Applied). Addison-Wesley, 2001.
  • [5] P. Aubert, N. Di Césaré, and O. Pironneau. Automatic differentiation in C++ using expression templates and application to a flow control problem. Computing and Visualization in Sciences, 3:197–208, 2001.
  • [6] J. Barton and L. Nackman. Scientific and Engineering C++. Addison-Wesley, 1994.
  • [7] C. Bendtsen and O. Stauning. FADBAD++, A flexible C++ package for automatic differentiation. Technical Report IMM-REP-1996-17, Department of Mathematical Modelling, Technical University of Denmark, 1996.
  • [8] C. Bischof, A. Carle, G. Corliss, A. Griewank, and P. Hovland. ADIFOR – Generating derivative codes from Fortran programs. Scientific Programming, 1:1–29, 1992.
  • [9] Boost. C++ Libraries, http://boost.org/.
  • [10] A. Griewank. On automatic differentiation. In Mathematical Programming: Recent Developments and Applications, pages 83–108. Kluwer Academic Publishers, 1989.
  • [11] A. Griewank. Some bounds on the complexity of gradients, Jacobians, and Hessians. In Complexity in Nonlinear Optimization, pages 128–161. World Scientific Publishers, 1993.
  • [12] A. Griewank. Evaluating Derivatives and Techniques of Algorithmic Differentiation. Number 19 in Frontiers in Appl. Math. SIAM, Philadelphia, 2000.
  • [13] A. Griewank and G. F. Corliss. Automatic Differentiation of Algorithms: Theory, Implementation and Application. SIAM, Philadelphia, 1991.
  • [14] A. Griewank and A. Walther. Evaluating Derivatives, Principles and Techniques of Algorithmic Differentiation. SIAM, second edition, 2008.
  • [15] Andreas Griewank, Jean Utke, and Andrea Walther.

    Evaluating higher derivative tensors by forward propagation of univariate taylor series.

    Math. Comput., 69(231):1117–1130, 2000.
  • [16] R. J. Hogan. Fast reverse-mode automatic differentiation using expression templates in C++. ACM Trans. Math. Softw., 40(4):1–16, 2014.
  • [17] Klaus Iglberger, Georg Hager, Jan Treibig, and Ulrich Rüde. Expression templates revisited: A performance analysis of current methodologies. SIAM Journal on Scientific Computing, 34(2):C42–C69, 2012.
  • [18] Drosos Kourounis, LouisJ. Durlofsky, Jan Dirk Jansen, and Khalid Aziz. Adjoint formulation and constraint handling for gradient-based optimization of compositional reservoir flow. Computational Geosciences, 18(2):117–137, 2014.
  • [19] Joaquim R. R. A. Martins, Peter Sturdza, and Juan J. Alonso. The complex-step derivative approximation. ACM Transactions on Mathematical Software, 29(3):245–262, 2003.
  • [20] S. Meyers. Effective C++. Addison-Wesley, third edition, 2005.
  • [21] Marco Nehmeier. Generative programming for automatic differentiation. In Shaun Forth, Paul Hovland, Eric Phipps, Jean Utke, and Andrea Walther, editors, Recent Advances in Algorithmic Differentiation, volume 87 of Lecture Notes in Computational Science and Engineering, pages 261–271. Springer Berlin Heidelberg, 2012.
  • [22] W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes in C++. Cambridge University Press, second edition, 2002.
  • [23] L. Rall. Automatic Differentiation: Techniques and Applications, volume 120 of Lectures Notes in Computer Science. Springer Verlag, Berlin, 1981.
  • [24] S. Schupp, D. Gregor, D. Musser, and S.-M. Liu. User-extensible simplification – Type-based optimizer generators, volume 2027 of Lecture Notes in Computer Science. Springer Verlag, Berlin, 2001.
  • [25] Ole Stauning and Claus Bendtsen. FADBAD++. Web page, 2007. (Last modified Nov. 2007).
  • [26] B. Stroustrup. The C++ Programming Language. Addison-Wesley, third edition, 1997.
  • [27] E. Unruh. Prime number computation. ANSI X3J16-95-0075/ISO WG21-462, 1994.
  • [28] D. Vandevoorde and N. M. Josuttis. C++ Templates: The Complete Guide. Addison-Wesley, 2002.
  • [29] T. Veldhuizen. Blitz++. http://www.oonumerics.org/blitz/.
  • [30] T. Veldhuizen. Expression templates. C++ Report, SIGS Publications Inc., ISSN 1040-6042, 7(5):26–31, 1995. Reprinted in C++ Gems, ed. Stanley Lippman.
  • [31] A. Walther and A. Griewank. Getting started with adol-c. In U. Naumann and O. Schenk, editors, Combinatorial Scientific Computing, chapter 7, pages 181–202. Chapman-Hall CRC Computational Science, 2012.
  • [32] R.M. Younis and K. Aziz. Parallel automatically differentiable data-types for next-generation simulator development. In SPE Paper 106493 presented at the SPE Reservoir Simulation Symposium, Houston, Texas, 2007.