1 Introduction
Ever since its early days, Dune [2, 1] has had a programmer interface for functions. This interface was based on the class Function, which basically looked like
and is located in the file dune/common/function.hh. This class was to serve as a model for duck typing, i.e., any object with its interface would be called a function. A main feature was that the result of a function evaluation was not returned as a return value. Rather, it was returned using a byreference argument of the evaluate method. The motivation for this design decision was runtime efficiency. It was believed that returning objects by value would, in practice, lead to too many unnecessary temporary objects and copying operations.
Unfortunately, the old interface lead to user code that was difficult to read in practice. For example, to evaluate the th Chebycheff polynomial using user methods my_cos and my_arccos would take several lines of code:
Additionally, C++ compilers have implemented returnvalue optimization (which can return values byvalue without any copying) for a long time, and these implementations have continuously improved in quality. Today, the speed gain obtained by returning result values in a byreference argument is therefore not worth the clumsy syntax anymore (we demonstrate this in Section 4). We therefore propose a new interface based on operator() and returning objects by value. With this new syntax, the Chebycheff example takes the much more readable form
To implement dynamic polymorphism, the old functions interface uses virtual inheritance. There is an abstract base class
in the file dune/common/function.hh. User functions that want to make use of runtime polymorphism have to derive from this base class. Calling such a function would incur a small performance penalty [3], which may possibly be avoided by compiler devirtualization [5].
The C++ standard library, however, has opted for a different approach. Instead of deriving different function objects from a common base class, no inheritance is used at all. Instead, any object that implements operator() not returning void qualifies as a function by duck typing. If a function is passed to another class, the C++ type of the function is a template parameter of the receiving class. If the type is not known at compile time, then the dedicated wrapper class
is used. This class uses type erasure to allow to handle function objects of different C++ types through a common interface. There is a performance penalty for calling functions hidden within a std::function, comparable to the penalty for calling virtual functions.
The dunefunctions module [4] picks up these ideas and extends them. The class template std::function works nicely for functions that support only pointwise evaluation. However, in a finite element context, a few additional features are needed. First of all, functions frequently need to supply derivatives as well. Also, for functions defined on a finite element grid, there is typically no single coordinate system of the domain space. Function evaluation in global coordinates is possible, but expensive; such functions are usually evaluated in local coordinates of a given element. dunefunctions solves this by introducing LocalFunction objects, which can be bound to grid elements. When bound to a particular element, they behave just like a std::function, but using the local coordinate system of that element.
The paper is structured as follows. The main building blocks for the proposed function interfaces, viz. callables, concept checks, and type erasure, are introduced in Section 2. Those techniques are then applied to implement the extended interfaces for differentiable functions and grid functions in Section 3. Finally, we present comparative measurements for the current and the proposed interface in Section 4. The results demonstrate that the increased simplicity, flexibility, and expressiveness do not have a negative performance impact.
2 Building blocks for function interfaces
The programmer interface for global functions is given as a set of model classes. These form a conceptual hierarchy (shown in Figure 1), even though no inheritance is involved at all. Implementors of the interface need to write classes that have all the methods and behavior specified by the model classes. This approach results in flexible code. Concept checking is used to produce readable error messages. The following sections explain the individual ideas in detail.
2.1 Callables and functions
The C++ language proposes a standard way to implement functions. A language construct is called a callable if it provides an operator(). In the following we will denote a callable as function if that operator() does not return void. In other words, a function foo is anything that can appear in an expression of the form
for an argument x of suitable type. Examples of such constructs are free functions
lambda expressions
function objects
and other things like pointers to member functions, and bind expressions.
All three examples are callable, i.e., they can be called:
Argument and return value do not have to be double
at all, any type is possible. They can be scalar or vector types, floating point or integer, and even more exotic data like matrices, tensors, and strings.
To pass a function as an argument to a C++ method, the type of that argument must be a template parameter.
Any of the example functions from above can be used as an argument of the method foo:
2.2 Concept checks
The calls to F are fast, because the function calls can be inlined. On the other hand, it is not clear from the interface of foo what the signature of F should be. What’s worse is that if a callable type with the wrong signature is passed, the compiler error will not occur at the call to foo but only where F is used, which may be less helpful. To prevent this, dunefunctions provides lightweight concept checks for the concepts it introduces. For example, the following alternative implementation of foo checks whether F is indeed a function in the sense given above
If foo is instantiated with a type that is not a function, say, an int, then a readable error message is produced. For example, for the code
GCC4.9.2 prints the error message (file paths and line numbers removed)
The provided concept checking facility is based on a list of expressions that a type must support to model a concept. The implementation is based on the techniques proposed by E. Niebler [8] and implemented in the rangev3 library [7]. While the definitions of the function concepts are contained in the dunefunctions module, the models() function that allows to check if a type models the concept is provided by the module dunecommon.
2.3 Type erasure and std::function
Sometimes, the precise type of a function is not known at compiletime but selected depending on runtime information. This behavior is commonly referred to as dynamic dispatch. The classic way to implement this is virtual inheritance: All functions must inherit from a virtual base class, and a pointer to this class is then passed around instead of the function itself.
This approach has a few disadvantages. For example, all function objects must live on the heap, and a heap allocation is needed for each function construction. Secondly, in a derived class, the return value of operator() must match the return value used in the base class. However, it is frequently convenient to also allow return values that are convertible to the return value of the base class. This is not possible in C++. As a third disadvantage, interfaces can only be implemented intrusively, and having one class implement more than a single interface is quite complicated.
The C++ standard library has therefore chosen type erasure over virtual inheritance to implement runtime polymorphism. Starting with C++11, the standard library contains a class [6, 20.8.11]
that wraps all functions that map a type Domain to a type (convertible to) Range behind a single C++ type. A much simplified implementation looks like the following:
The classes FunctionWrapper and FunctionWrapperBase look like this:
Given two types Domain and Range, any function object whose operator() accepts a Domain and returns something convertible to Range can be stored in a std::function<Range(Domain)>. For example, reusing the three implementations of from Section 2.1, one can write
Note how different C++ constructs are all assigned to the same object. One can even use
but not
Looking at the implementation of std::function, one can see that virtual inheritance is used internally, but it is completely hidden to the outside. The copy constructor accepts any type as argument. In a full implementation the same is true for the move constructor, and the copy and move assignment operators. For each function type F, an object of type FunctionWrapper<Range(Domain),F> is constructed, which inherits from the abstract base class FunctionWrapperBase<Range(Domain)>.
Considering the implementation of std::function as described here, one may not expect any runtime gains for type erasure over virtual inheritance. While std::function itself does not have any virtual methods, each call to operator() does get routed through a virtual function. Additionally, each call to the copy constructor or assignment operator invokes a heap allocation. The virtual function call is the price for runtime polymorphism. It can only be avoided in some cases using smart compiler devirtualization.
To alleviate the cost of the heap allocation, std::function implements a technique called small object optimization. In addition to the pointer to FunctionWrapper, a std::function stores a small amount of raw memory. If the function is small enough to fit into this memory, it is stored there. Only in the case that more memory is needed, a heap allocation is performed. Small object optimization is therefore a tradeoff between runtime and space requirements. std::function needs more memory with it, but is faster for small objects.
Small object optimization is not restricted to type erasure, and can in principle be used wherever heap allocations are involved. However, with a virtual inheritance approach this nontrivial optimization would have to be exposed to the user code, while all its details are hidden from the user in a type erasure context.
While we rely on std::function as a type erasure class for global functions, this is not sufficient to represent extended function interfaces as discussed below. To this end dunefunctions provides utility functionality to implement new type erasure classes with minimal effort, hiding, e.g., the details of small objects optimization. This can be used to implement type erasure for extended function interfaces that go beyond the ones provided by dunefunctions. Since these utilities are not functionspecific, they can also support the implementation of typeerased interfaces in other contexts. Similar functionality is, e.g., provided by the poly library (which is part of the Adobe Source Libraries [9]), and the boost type erasure library [10].
3 Extended function interfaces
The techniques discussed until now allow to model functions
(1) 
between a domain and a range by interfaces using either static or dynamic dispatch. In addition to this, numerical applications often require to model further properties of a function like differentiability or the fact that it is naturally defined locally on grid elements. In this section we describe how this is achieved in the dunefunctions module using the techniques described above.
3.1 Differentiable functions
The extension of the concept for a function (1) to a differentiable function requires to also provide access to its derivative
(2) 
where, in the simplest case, is the set of linear maps from the affine hull of to . To do this, the dunefunctions module extends the ideas from the previous section in a natural way. A C++ construct is a differentiable function if, in addition to having operator() as described above, there is a free method derivative that returns a function that implements the derivative. The typical way to do this will be a friend member method as illustrated in the class template Polynomial:
To use this class, write
Note, however, that the derivative method may be expensive, because it needs to compute the entire derivative function. It is therefore usually preferable to call it only once and to store the derivative function separately.
Functions supporting these operations are described by Concept::DifferentiableFunction, provided in the dunefunctions module.
When combining differentiable functions and dynamic polymorphism, the std::function class cannot be used as is, because it does not provide access to the derivative method. However, it can serve as inspiration for more general type erasure wrappers. The dunefunctions module provides the class
in the file dune/functions/common/differentiablefunction.hh. Partially, it is a reimplementation of std::function. The first template argument of DifferentiableFunction is equivalent to the template argument Range(Args...) of std::function, and DifferentiableFunction implements a method
This method works essentially like the one in std::function, despite the fact that its argument type is fixed to const Domain& instead of Domain, because arguments of a “mathematical function” are always immutable. Besides this, it also implements a free method
that wraps the corresponding method of the function implementation. It allows to call the derivative method for objects whose precise type is determined only at runtime:
While the domain of a derivative is , the same as the one of the original function, its range is . Unfortunately, it is not feasible to always infer the best C++ type for objects from . To deal with this, dunefunctions offers the DerivativeTraits mechanism that maps the signature of a function to the range type of its derivative. The line
shows how to access the type that should be used to represent elements of . The template DefaultDerivativeTraits is specialized for common combinations of Dune matrix and vector types, and provides reasonable defaults for the derivative ranges. However, it is also possible to change this by passing a custom DerivativeTraits template to the interface classes, e.g., to allow optimized applicationspecific matrix and vector types or use suitable representations for other or generalized derivative concepts.
Currently the design of DifferentiableFunction differs from std:function in that it only considers a single argument, but this can be vector valued.
3.2 GridView functions and local functions
A very important class of functions in any finite element application are discrete functions, i.e., functions that are defined piecewisely with respect to a given grid. Such functions are typically too expensive to evaluate in global coordinates. Luckily this is hardly ever necessary. Instead, the arguments for such functions are a grid element together with local coordinates of that element. Formally, this means that we have localized versions
of , where is a parametrization of a grid element over a reference element .
To support this kind of function evaluation, Dune has provided interfaces in the style of
Given an element element and a local coordinate x, such a method would evaluate the function at the given position, and return the result in the third argument y. This approach is currently used, e.g., in the grid function interfaces of the discretization modules dunepdelab and dunefufem.
There are several disadvantages to this approach. First, we have argued earlier that returnbyvalue is preferable to returnbyreference. Hence, an obvious improvement would be to use
instead of the evaluateLocal method. However, there is a second disadvantage. In a typical access pattern in a finite element implementation, a function evaluation on a given element is likely to be followed by evaluations on the same element. For example, think of a quadrature loop that evaluates a coefficient function at all quadrature points of a given element. Function evaluation in local coordinates of an element can involve some setup code that depends on the element but not on the local coordinate x. This could be, e.g., prefetching of those coefficient vector entries that are needed to evaluate a finite element function on the given element, or retrieving the associated shape functions.
In the approaches described so far in this section, this setup code is executed again and again for each evaluation on the same element. To avoid this we propose the following usage pattern instead:
Here we first obtain a local function, which represents the restriction of f to a single element. This function is then bound to a specific element using the method
This is the place for the function to perform any required setup procedures. Afterwards the local function can be evaluated using the interface described above, but now using local coordinates with respect to the element that the local function is bound to. The same localized function object can be used for other elements by calling bind with a different argument. Functions supporting these operations are called grid view functions, and described by Concept::GridViewFunction The local functions are described by Concept::LocalFunction. Both concepts are provided in the dunefunctions module.
Since functions in a finite element context are usually at least piecewise differentiable, grid view functions as well as local functions provid the full interface of differentiable functions as outlined in Section 3.1. To completely grasp the semantics of the interface, observe that strictly speaking localization does not commute with taking the derivative. Formally, a localized version of the derivative is given by
(3) 
In contrast, the derivative of a localized function is given by
However, in the dunefunctions implementation, the derivative of a local function does by convention always return values in global coordinates. Hence, the functions dfe1 and dfe2 obtained by
both behave the same, implementing as in (3). This is motivated by the fact that is hardly ever used in applications, whereas is needed frequently. To express this mild inconsistency in the interface, a local function uses a special DerivativeTraits implementation that forwards the derivative range to the one of the corresponding global function.
Again, type erasure classes allow to use grid view and local functions in a polymorphic way. The class
stores any function that models the GridViewFunction concept with given signature and grid view type. Similarly, functions modeling the LocalFunction concept can be stored in the class
These type erasure classes can be used in combination:
Notice that, as described above, the DerivativeTraits used in polymorphicLocalF are not the same as the ones used by polymorphicF. Instead, they are a special implementation forwarding to the global derivative range even for the domain type LocalCoordinate.
4 Performance measurements
In this last chapter we investigate how the interface design for functions in Dune influence the runtime efficiency. Two particular design choices are expected to be critical regarding execution speed: (i) returning the results of function evaluations by value involves temporary objects and copying unless the compiler is smart enough to remove those using returnvalueoptimization. In the old interface, such copying could not occur by construction, (ii) using type erasure instead of virtual inheritance for dynamic polymorphism. While there are fewer reasons to believe that this may cause changes in execution time, it is still worthwhile to check empirically.
As a benchmark we have implemented a small C++ program that computes the integral
for different integrands, using a standard composite midpoint rule. We chose this problem because it is very simple, but still an actual numerical algorithm. More importantly, most of the time is spent evaluating the integrand function. Finally, hardly any main memory is needed, and hence memory bandwidth limitations will not influence the measurements.
The example code is a pure C++11 implementation with no reference to Dune. The relevant interfaces from Dune are so short that it was considered preferable to copy them into the benchmark code to allow easier building. The code is available in a single file attached to this pdf document, via the icon in the margin.^{†}^{†}margin: 📌
To check the influence of return types with different size we used integrands of the form
for various sizes . This special choice was made to keep the computational work done inside of the function to a minimum while avoiding compiler optimizations that replace the function call by a compiletime expression. The test was performed with subintervals for the composite midpoint rule leading to function evaluations, such that the timings are directly comparable for different values of .
For the test we implemented four variants of function evaluation:

Returnbyvalue with static dispatch using plain operator(),

Return via reference with static dispatch using evaluate(),

Returnbyvalue with dynamic dispatch using std::function::operator(),

Return via reference with dynamic dispatch using VirtualFunction::evaluate(), as in the introduction.
The test was performed with components for the function range, using double
to implement the components. We used GCC4.9.2 and Clang3.6 as compilers, as provided by the Linux distribution Ubuntu 15.04. To avoid cache effects and to eliminate outliers we did a warmup run before each measured test run and selected the minimum of four subsequent runs for all presented values.
Figure 2.A shows the execution time in milliseconds over when compiling with GCC4.9 and the compiler options std=c++11 O3 funrollloops. One can observe that the execution time is the same for variants (a) and (b) and all values of . We conclude that for static dispatch there is no runtime overhead when using returnbyvalue, or, more precisely, that the compiler is able to optimize away any overhead. Comparing the dynamic dispatch variants (c) and (d) we see that for small values of there is an overhead for returnbyvalue with type erasure compared to classic virtual inheritance. This is somewhat surprising since pure returnbyvalue does not impose an overhead, and dynamic dispatch happens for both variants.
Guessing that the compiler is not able to optimize the nested function calls in the type erasure interface class std::function to full extent, we repeated the tests using profile guided optimization. To this end the code was first compiled using the additional option fprofilegenerate. When running the obtained program once, it generates statistics on method calls that are used by subsequent compilations with the additional option fprofileuse to guide the optimizer. The results depicted in Figure 2.B show that the compiler is now able to generate code that performs equally well for variant (c) and (d). In fact variant (c) is sometime even slightly faster.
Finally, Figure 2.C shows results for Clang3.6 and the compiler options std=c++11 O3 funrollloops. Again variants (a) and (b) show identical results. In contrast, variant (c) using std::function is now clearly superior compared to variant (d). Note that we only used generalpurpose optimization options and that this result did not require finetuning with more specialized compiler flags.
5 Conclusion
We have presented a new interface for functions in Dune, which is implemented in the new dunefunctions module. The interface follows the ideas of callables and std::function from the C++ standard library, and generalises these concepts to allow for differentiable functions and discrete grid functions. For runtime polymorphism we offer corresponding type erasure classes similar to std::function. The performance of these new interfaces was compared to existing interfaces in Dune. When using the optimization features of modern compilers, the proposed new interfaces are at least as efficient as the old ones, while being much easier to read and use.
References
 [1] P. Bastian, M. Blatt, A. Dedner, C. Engwer, R. Klöfkorn, R. Kornhuber, M. Ohlberger, and O. Sander. A generic grid interface for adaptive and parallel scientific computing. Part II: Implementation and tests in DUNE. Computing, 82(2–3):121–138, 2008.
 [2] P. Bastian, M. Blatt, A. Dedner, C. Engwer, R. Klöfkorn, M. Ohlberger, and O. Sander. A generic grid interface for adaptive and parallel scientific computing. Part I: Abstract framework. Computing, 82(2–3):103–119, 2008.
 [3] K. Driesen and U. Hölzle. The direct cost of virtual function calls in C++. In Proceedings of the 11th ACM SIGPLAN Conference on Objectoriented Programming, Systems, Languages, and Applications, OOPSLA ’96, pages 306–323. ACM, 1996.
 [4] C. Engwer, C. Gräser, S. Müthing, and O. Sander. Dunefunctions module. http://www.duneproject.org/modules/dunefunctions.
 [5] J. Hubička. Devirtualization in C++. online blog, {http://hubicka.blogspot.de/2014/01/devirtualizationincpart1.html}, 2014. (at least) seven parts, last checked on Dec. 8. 2015.
 [6] International Organization for Standardization. ISO/IEC 14882:2011 Programming Language C++, 9 2011.
 [7] E. Niebler. Rangev3 library. https://github.com/ericniebler/rangev3.
 [8] E. Niebler. Concept checking in C++11. online blog, http://ericniebler.com/2013/11/23/conceptcheckinginc11, 2013. last checked on Dec. 8. 2015.
 [9] S. Parent, M. Marcus, and F. Brereton. Adobe source libraries. http://stlab.adobe.com/.
 [10] S. Watanabe. Boost type erasure library. http://www.boost.org/doc/libs/release/libs/type_erasure/.