Automatically exported from code.google.com/p/generic-transformers
We present an approach for a lightweight datatype-generic programming in Objective Caml programming language aimed at better code reuse. We show, that a large class of transformations usually expressed via recursive functions with pattern matching can be implemented using the single per-type traversal function and the set of object-encoded transformations, which we call transformation objects. Object encoding allows transformations to be modified, inherited and extended in a conventional object-oriented manner. However, the data representation is kept untouched which preserves the ability to construct and pattern-match it in the usual way. Our approach equally works for regular and polymorphic variant types which makes it possible to combine data types and their transformations from statically typed and separately compiled components. We also present an implementation which allows us to automatically derive most functionality from a slightly augmented type descriptions.READ FULL TEXT VIEW PDF
Automatically exported from code.google.com/p/generic-transformers
Statically typed functional languages are widely renowned for the gears they provide for describing complex data structures and their transformations. One of the most utilized tools are algebraic data types (ADTs) which allow to construct graph-shaped data structures and inspect them by matching against patterns. Parametric polymorphism makes it possible to apply transformation functions for various concrete versions of described data structure thus facilitating massive code reuse. However transformations for regular ADTs suffer from the lack of extensibility: there is no easy way to modify/update their behavior without the complete reimplementation (at least for the recursive ADTs, which are the most important case). In contrast, in the parallel universe of object-oriented programming the specialization or reuse of behavior for the cases of interest is a matter of general practice; however, a certain price in the form of overweighted object representation and mixing of code and data is paid on that way. The winning combination of both approaches — polymorphic ADTs and their extensible transformations — looks quite promising for the languages which combine functional and object-oriented constructs.
We present a framework for Objective Caml222https://code.google.com/p/generic-transformers which is based on object representation of transformations. Object encoding allows transformations to be inherited, modified and reused; transformation objects contain no data, which provides perfect code and data separation and preserves all other means of manipulation for the regular ADTs.
The set of transformations expressible using our approach can easily be characterized in terms of classic cycle-free attribute grammars . If we interpret the data structure as a derivation tree, the result of the transformation as a synthesized attribute and all additional parameters of the transformation as inherited attributes then any transformation can be implemented using our framework if no attribute value depends on itself.
Another interesting feature of the approach is that it provides a clear and well-established interface between generic and specific code. Thus the framework we describe is in turn becomes extensible. We equip it with a plugin system which implements in a user-defined manner the similar functionality as “deriving” primitive in Haskell.
Our solution is completely type-driven which allows automatic implementation. We implemented our approach in the form of camlp5333http://pauillac.inria.fr/~ddr/camlp5 syntax extension and runtime support library.
All code snippets in the following sections are written in Objective Caml. While we strived to make all examples as much self-contained as possible a certain familiarity with Objective Caml fundamentals is still desirable, especially with those concerning object-oriented constructs and types.
Most of the examples in the following sections refer to the transformations and tasks typical for the domain of programming languages implementation and compilers since this is the area of our interest. We believe, however, that the proposed functionality is generic enough to be used in other areas as well.
In this section we present transformation objects as a programming pattern, provide some motivations and describe the proposed solution in a nutshell.
Consider the type of lambda expressions and its visualization function — show:
Sometimes (actually, more often than one would desire) the behavior of such a function has to be slightly modified — for example, one may need to show “Var s” as just “s”, omitting the constructor name. Since the modification is quite modest, we might expect a significant code reuse. The naïve modification
however, would not work as we desired — we resorted to the “old” show which was left unmodified. Thus our modification will work only for the top-level Vars. The only remaining option is to rewrite show completely which is at least regretful and at most impossible (for example, when we deal with an external library function with hidden implementation).
The desired extensibility can easily be provided by the conventional object-oriented encoding: we can represent expressions using a hierarchy of classes (one class per a constructor) and provide in each class corresponding method show. However, thus we would lose the ability to match expressions against patterns; moreover, object data representation would force us to implement virtually every expression-processing function as a set of per-class methods scattered through various class definitions. Apart of being much more verbose this solution would be more error-prone and less readable.
The better solution would be to apply object-oriented encoding to the transformation itself, not to the transforming data. So transformation objects come into consideration. Transformation object is just an object which contains some functionality needed to perform the transformation. If we deal with the algebraic data types then the appropriate representation for a transformation object for a given type is a collection of per-constructor transformation methods. Besides that we need a function which performs a pattern-matching on the transforming data structure and dispatch the control to the transformation object’s methods. Taking these considerations into account we may implement our show function in the following manner:
Here generic_show is a top-level function parameterized by the transformation object t. It exhaustively matches the expression against patterns and dispatches the control to the appropriate transformation object methods. Note that some of these methods have to be additionally parameterized by the transformation function in question (self) since we do not want them to be directly dependent on generic_show (this would compromise the very idea of extensibility).
Now the customized version of show can really be implemented in a reusable (albeit a bit verbose) manner:
While this time we managed to provide an appropriate implementation the generality of the described approach is still unclear. Do we need to implement both the traversal function and transformation-specific class each time we need an extensible solution? Can this process be automated in a type-driven manner?
The key observation we can make so far is that the traversal function (generic_show) in our example turned out to be more generic that we expected — it contains no specific “show” functionality apart from the transformation object it takes as a parameter. In the following section we describe a generalized version of presented pattern which allows to systematically produce transformations from type descriptions. Each such transformation is expressed using a single per-type traversal function which is generated by the framework as well.
In this section we describe a per-type abstract attribute transformer, type-specific traversal function and some auxiliary notions needed for our approach to work. We call our transformers attribute since their design was inspired by the notion of computations defined by attribute grammars . However, we do not use attribute grammar formalism directly as a sort of declarative description; we rather borrow some principles and terminology to make the foundations of our approach more conventional.
We provide camlp5 syntax extension which generates type-specific traversal function, abstract transformer and some additional decorations from a type declaration. For example
defines the type itself, its traversal function and abstract transformer (from now on we will gray out the constructs, specific to our framework, including extended syntax). With this definition we may use predefined type-indexed traversal function transform(lam) and abstract transformer class @lam to implement various concrete transformations.
Suppose we have a polymorphic ADT
where we use square brackets to denote a vector of something; heredenotes -th type parameter, — -th constructor of the type t,  — vector of -th constructor argument types. The transformation we are looking for has the following type:
Here — the type variable which designates the type of inherited attribute — some auxiliary argument which might be helpful to perform the transformation, — the type variable which designates the type of transformation’s result. We call this result synthesized attribute. Since t is polymorphic, to perform the transformation we might need some transformations of its type arguments. Thus our type evolves into
We denote the type of synthesized attribute for the transformation of . Note that we treat type parameter transformations as attribute transformations as well; note also that these transformations can provide different types of synthesized attributes, but operate on the same type of inherited attribute. This may look somewhat restrictive; however, we always can “lift” different types of inherited attributes into their sum type.
The reasoning given above lets us to provide the type signature of abstract transformer. Abstract transformer for a given type is a virtual444In the context of Objective Caml “virtual” means “purely abstract”. class; all concrete transformation objects for the type are implemented in our framework as instances of its subclasses. Abstract transformer is polymorphic over the types of inherited and synthesized attributes, type parameters of the transforming type (if any) and the types of synthesized attributes for those type parameters. Abstract transformers are an important part of our solution. Since we are aimed at code reuse based on inheritance we can not use implicit transformation objects (which can not be inherited in Objective Caml) ; on the other hand polymorphic class definitions as a rule require precise annotations for the types of their methods. These type annotations in our case can be quite verbose and tedious to specify. Providing the single supertype for all concrete transformations allows us not only to specify the type of the traversal function but also to automatically instantiate all types of concrete transformation object methods.
The header of the abstract transformer for the type t looks like555In Objective Caml classes can be polymorphic; type parameter list enclosed in a square brackets should precede the name of the class in its declaration.
Here the nested pair of square brackets again indicates the vector of pairs of type variables. Note that we need to know only the vector of type parameters to decide on the signature of the abstract transformer for that type. For example, in the concrete syntax, if we have a type (’a, ’b, ’c) t, then the abstract transformer for t has the following signature:
Here ’ta, ’tb, ’tc designate the types of synthesized attributes for the transformations of corresponding type parameters, ’inh, ’syn — types of inherited and synthesized attributes for the transformation for t. By @t we denote some synthetic name for the abstract transformer’s class since in Objective Caml classes and types share the same namespace.
The next component of our solution is a per-type traversal function which performs pattern matching and passes control to the transformation object. Taking into account all previous type considerations its type should look like
Here outlines the vector of transformations for the type parameters, — the type of concrete transformation object, which should be an arbitrary properly instantiated subtype of abstract transformer @t666In Objective Caml, \#t denotes the arbitrary subtype for a class type t.. The rest of components are inherited attribute type, type of data structure to transform and the type of synthesized attribute.
The last thing we need to define is the set of parameters which are passed to the methods of transformation object during pattern matching. Since we are dealing with attribute transformations we must provide for each method an inherited attribute, which comes as a parameter to the traversal function. Besides that, each method may need synthesized attributes for some (sub)values of the matched value. In our framework synthesized attributes can be calculated only by applying some transformation functions. The only functions available are those for type parameters (supplied as arguments) or the transformation for the type of interest.
The exact implementation of the pattern matching within the traversal function is as follows: for the constructor of we add the following case to the matching construct:
Here t — transformation object, i — inherited attribute (both are passed to the traversal function as parameters). and are augmented versions of (the original node of the transforming data structure) and (all proper subvalues of ). Namely, we augment these values with functions which deliver synthesized attributes when applied to inherited ones. The rules for the augmentation are as follows:
if the type of the augmenting value corresponds to a certain type parameter, we augment it with the corresponding transformation function for that type parameter;
if the type of the augmenting value is t, where — type parameters, t — the type we are implementing the current traversal function for, then we augment it with the partial application of the same traversal function to the transformation functions for corresponding type parameters and the same concrete transformation object;
in all other cases we do not augment the value.
In cases when we perform the augmentation we also augment the value with the set of all transformation functions for all type parameters.
The augmented value is represented as a structure type a with the following fields:
x — the value which was augmented;
f — the transformation function for the values of the type of x;
fx — the partial application of f to x;
tp — the set of transformation functions for all type parameters (encoded as object with corresponding methods).
We demonstrate these constructs by the following example. Let we have the following type definition:
The traversal function for this type is
Since we have two type parameters, the traversal function takes two transformation functions — fa and fb — as its parameters. Then, we need one augmenting function for the type itself ((’a, ’b) t). This function is called self. Finally, we need a collection of transformation functions for the type parameters encoded by an object with corresponding method names — hence tpo. The augmenting primitive here is called make.
Now, when we are implementing the concrete transformation class, we may think in terms of inherited and synthesized attributes and attribute transformations. For example, writing the method for the constructor A, say
we know the following:
inh is the inherited attribute;
s.f and x.f equals to the same transformation function we are dealing with now;
s.tp#a is the transformation function for the type parameter ’a;
s.tp#b is the transformation function for the type parameter ’b;
s.fx is a function which calculates the synthesized attribute for s with respect to some inherited attribute (for example, but not necessary, inh);
x.fx is a function which calculates the synthesized attribute for x with respect to some inherited attribute (for example, but not necessary, inh).
Note that due to a late binding for objects the concrete implementations of augmenting functions can be redefined in subclasses. This property is important for code reuse.
Despite being rather simple in design the approach in question turned out to be tricky in implementation via a syntax extension due to a limited amount of information available about externally declared types. Another problem arises for mutually-recursive type declarations — the naïve implementation using mutually-recursive classes does not provide extensible solution since each abstract transformer explicitly references abstract transformers for co-recursive types. To provide extensible solution we had to abstract these transformers by a certain parameterization which in implementation resulted in dealing with parameterized classes, mutability, class-level let-bindings and initializers. We omit exact description since it is too technical and specific. Finally, polymorphic classes in Objective Caml are regular, which means that only instances with the same type parameter bindings can be created within their scopes. While this limitation in principle can be worked around using extra parameterization via explicitly-polymorphic functions we did not implement this option yet.
Nevertheless apart from the “regularity” limitation (w.r.t. to mutual recursion) our syntax extension provides complete support for ADTs, polymorphic variant types , structures and tuples, including mutually recursive type declarations.
The diversity of generic type-driven transformations is widely acknowledged. Even for the string conversion functions like show there are many various options — for example, conversion to HTML or XML formats, printer combinators, incremental append into string buffer etc. We admit that all these cases are rather simple and regular; however, the simpler transformation the more distressful it would be to implement it manually for each type of interest.
Our syntax extension can be customized by the end-user via rather simple plugin interface. For example, in the following fragment
show designate plugin name; there are no hardcoded transformations in our system at all; “with” construct plays role of plugin invocation primitive.
Each plugin is dynamically loaded during the syntax extension phase and generates concrete transformation on a per-type (actually, per-constructor) basis. Since any transformation in our framework is represented by a certain class, each plugin actually generates one class per type. To address plugin-defined transformation p for the type t extended construct @p[t] can be used. Most work is performed by the core system; the plugin itself provides rather simple parameterization plus a concrete function to generate the body for each method. For example, the implementations of show contains less than 50 lines, about 1/3 of which are just interface ceremonial code.
In this section we present some use cases which we believe demonstrate the potential of our framework in terms of code reuse.
For our main example we may implement the following transformation class:
This implementation looks completely vacuous at the first glance: fold_lam simply threads the inherited attribute through all the nodes and finally returns it untouched. However, this behavior is just what we need as a basis for various transformations which can be obtained by a proper modification. For example,
gives us the set of all variables occurred in a lambda-term (here S.t stands for the type of string sets). The set of all free variables can be calculated by inheriting from the class vars:
As we can see, in the implementation of vars we reused two cases from fold_lam, and in the implementation of free_vars we (again) reused two cases from vars. Without late binding we would have to provide two different functions with complete case analysis to fold with.
The similar considerations are applicable to yet another important transformation — “map”. Indeed, having a “default” implementation in the form of copying we may then redefine its behavior for the “interesting” cases providing various useful concrete transformations. Predefined plugins for fold and map are included in our framework among show.
“Expression problem”  is a widely recognized reference task in the area of component-based software development. The task is to implement an expression evaluator which can be incrementally extended with new cases without modifications of existing code.
Expression problem can easily be solved in Objective Caml using polymorphic variants . Polymorphic variant types  are an extended version of regular ADTs which was introduced in Objective Caml starting from the version 3. In short, unlike regular ADTs, for which a certain constructor can belong to exactly one type (in its scope), different polymorphic variant types can share the same constructors. This, in particular, creates a possibility to operate with types in a structural, not nominal, way. For example, polymorphic variants can be subtyped, inherited and defined implicitly.
First, we declared two partial expression types (var and arith) with their evaluators (var_eval and arith_eval); type arith is made polymorphic in advance to be more open — this is the feature of the original solution. Note that these evaluators are polymorphic: in the case of type var we do not know the type of evaluation result (synthesized attribute, ’v); in the case of type arith we do not know the type of state (inherited attribute, ’b). This property ensures us that we did not introduce any artificial restrictions for these evaluators. We then combine both partial types (via regular inheritance for polymorphic variants) and their evaluators (via regular inheritance for classes). Now we can unambiguously determine the types of state (string->int) and evaluation result (int). The type expr is again open, so we have to “tie the knot” in the top-level evaluator (eval).
In our implementation all “glue code”, needed in  to combine the evaluators for the partial types, is generated by the framework and completely invisible on the user level.
For the final example we consider the implementation of different reduction strategies for lambda calculus. As a reference we choose Peter Sestoft’s paper , which provides a nice categorization of reduction order steps. Seven reduction strategies are described using this categorization in terms of big-step operational semantics; for three of them a reference ML-implementation is provided. Here we present an implementation which literally follows Sestoft’s reasonings.
The type of lambda expressions we have been considering so far was actually borrowed from the referenced paper; for this type we can declare the following virtual class which abstracts any admissible reduction order according to Sestoft’s categorization (see Fig. 2). Apart from inheriting from the abstract transformer @lam we introduce the following supplementary methods:
Method head reduces a lambda expression in a head position; we provide here a default implementation which uses exactly the same reduction order as being defined. However this is not always the case — some reduction strategies (called “hybrid” in ) use different orders for this purpose.
Method arg reduces a lambda expression in an argument position when expression in corresponding head position was not reduced to abstraction.
Method subst_arg reduces a lambda expression in an argument position when expression in corresponding head position was reduced to abstraction.
Type context represents name-generating function needed to perform alpha-conversions and plays role of the inherited attribute. Type mtype used as an abbreviation for types of supplementary methods, which transform inherited attribute (context) and augmented lambda expression (see Section 2) into lambda expression. Finally we provide two generic implementations for reducing variables and applications. Variables never reduced; in application the expression in a head position is reduced first with method head, then the result is inspected: if it is an abstraction then its argument is reduced with subst_arg and then a substitution is performed, otherwise its argument is reduced with arg. We did not include in the listing the implementation of substitution function subst.
class virtual reduce_under_abstractions =
object inherit reducer
method c_Lam c _ x l = Lam (x, l.fx c)
class virtual reduce_arguments =
object inherit reducer
method arg c x = x.fx c
class virtual non_strict =
object inherit reducer
method subst_arg _ m = m.x
class virtual dont_reduce_under_abstractions =
object inherit reducer
method c_Lam _ s _ _ = s.x
class virtual dont_reduce_arguments =
object inherit reducer
method arg _ x = x.x
class virtual strict =
object inherit reducer
method subst_arg c m = m.fx c
Having all these traits defined we finally can implement all reduction orders by simply combining relevant traits via inheritance. Some interesting cases are presented on Fig. 4. Note that this example demonstrates that object representation of transformations provides more then just componentization — in cases of applicative order or hybrid applicative order we did not combine the reduction transformations “from scratch” using the basic traits; we rather redefined some traits in already completely implemented transformations777In the latter case the order of inheritance clauses is important since each inheritance (re)binds some methods..
In order to trace individual reductions during normalization a second implementation is sketched in the original paper. In that implementation lambda-term under reduction is represented as a redex and its context — lambda-term with a hole. Contexts represented by a functions with type lam -> lam. A small set of combinators is provided to manipulate context which is passed as auxiliary argument and modified as reduction point advances into the original term. Since this modification is global it affects the implementation of each reduction strategy.
In our case, however, it is sufficient to modify only the implementations of the base class and one reduction order trait. The definitions of all other traits and individual reduction strategies can be left completely intact. The complete self-contained implementation for both cases is included into Appendix.
class call_by_name = object
let bn = transform(lam) (new call_by_name)
class applicative = object
let ao = transform(lam) (new applicative)
class call_by_value = object
let bv = transform(lam) (new call_by_value)
class hybrid_applicative = object
method head c x = bv c x.x
let ha = transform(lam) (new hybrid_applicative)
Our approach was specifically designed for Objective Caml; it is interesting to discuss other languages it can be implemented for. Such languages must at least combine first-class functions, objects with inheritance and late binding and algebraic datatypes. Scala888http://www.scala-lang.org and Kotlin999http://kotlin.jetbrains.org can be considered as relevant examples. However in these languages object-oriented and functional features are not orthogonal — first-class functions and ADTs are mimicked and projected into object-oriented layer which leads to internal object-oriented data representation and makes our approach superfluous (though possible). As another candidate we can mention Haskell, in which object-oriented extension can be implemented using typeclass-level metaprogramming .
Code reuse problem which we addressed can be dealt with in many various ways. Datatype-generic programming  aimed at reuse of type-driven transformations; we can mention “Scrap Your Boilerplate” [14, 15, 16] or “instant generics”  as examples. Some of their functionality (e.g., serialization, generic maps and queries) definitely can be reproduced using our framework in terms of plugins. In  a general constructive approach is provided for various classes of type-driven transformations. From that perspective all transformations we are able to implement using our framework can be characterized as catamorphisms. Recursive types considered as fixed points of (poly)functors; any catamorphism can be represented via combination of type-specific traversal function (“fold”) and transformation-specific function (“algebra”). There is a strong analogy between this approach and ours — “fold” corresponds to our type-indexed “transform”, while “algebra” — to object-encoded transformer. The difference is that our “transform” is non-recursive, concrete transformation functions are captured and passed to object-encoded transformer as parameters. So, in our framework the traversal order of data structure can be made specific not to its type, but to a concrete transformation; so generally we can implement a superset of catamorphism class described in .
Another relevant approach which in fact motivated our work is suggested in . Attribute grammar-based domain-specific language is proposed for describing catamorphisms in a declarative form. This domain-specific language is implemented as a preprocessor for Haskell. A specifications for different attributes and their evaluation rules can incrementally be introduced, modified and combined yielding transformation which performs all these evaluation provided these descriptions are consistent with each other. The consistency property is statically checked using advanced type-system encoding including type arithmetics.
The direct reuse of the aforementioned approach is impossible due to the differences in type systems between Haskell and Objective Caml (Haskell implementation utilizes heterogeneous collections and, hence, requires type arithmetic). Moreover, the explicit encoding of attributes and their evaluation rules can compromise the idea of code reuse since we cannot have different evaluation rules for the same attribute (as we actually had two different show functions for the same type in Section 1).
Another approach to code reuse is concentrated on composing software from separately developed reusable components. We already mentioned “Expression problem” as a reference task in this area. “Expression problem” can be solved in a number of languages including Haskell , Java  and Objective Caml . As we demonstrated, our framework is compatible with the solution for Objective Caml; moreover it allows even more code reuse since with our framework it is possible to modify the behavior of completely assembled transformation (which is impossible, for example, in  due to uniqueness of class membership instantiation for a given type).
From the technical point of view our approach resembles MapGenerator101010http://brion.inria.fr/gallium/index.php/Camlp4MapGenerator and FoldGenerator111111http://brion.inria.fr/gallium/index.php/Camlp4FoldGenerator which are shipped with camlp4. These tools provide syntax extensions which generate “map” and “fold” transformations in a form of a single class per a cluster of mutually-recursive type definitions. Each method of this class represents the transformation for each type of the cluster. This representation indeed allows to modify the behavior of transformation by inheritance. There are, however, some differences, which make that approach less general then ours:
Method-per-type representation does not eliminate the need for pattern-matching. Extending transformation method one still needs to manually match against “interesting” constructors which we consider a boilerplate.
Similarly, the transformation for a union of polymorphic variant types can not be constructed by inheritance from the transformations for its counterparts — some “glue” code is needed.
There are many other interesting transformations which are left overboard (e.g. “show”, “compare” etc.). While some of them technically can be seen as specializations of fold or map in fact no code reuse can be achieved by utilizing Fold/MapGenerator since the body of each generated method has to be completely reimplemented. In our framework these transformation can be implemented in form of plugins with short and simple implementation.
Another “close relative” of our framework is “deriving” syntax extension and library121212https://code.google.com/p/deriving. Like our plugin system, “deriving” allows to generate specific functionality from type definitions, and the assortment of traits can potentially be extended by the end-user . However this framework does not utilize object-encoding which makes the generated traits less flexible.
We presented a generic programming framework for Objective Caml which is based on the notion of object-encoded transformations. Proposed approach facilitates even more code reuse in comparison with conventional tools such as classes/objects and polymorphic variants since it allows to combine best practices from both functional and object-oriented programming. The implementation consists of a syntax extension which operates on slightly augmented descriptions of Objective Caml types and very small runtime library. Additionally the framework itself can be parameterized by mean of plugins which provide functionality to generate custom type-driven transformations conforming the generic framework interface. We believe that a wide range of transformations for regular ADTs and polymorphic variant types can be expressed using approach in question. Finally, our approach gives a good example of object-oriented programming in terms of Objective Caml.
Recent development of Objective Caml introduced some new features such as GADT, open types and extensible functions. An improvement of our implementation to support these features can be considered as future work as well as elimination of “regularity” restriction. As another important problem we can mention performance evaluation of generic vs hand-written transformations and reducing genericity-imposed overhead.