Free Theorems Simply, via Dinaturality

Free theorems are a popular tool in reasoning about parametrically polymorphic code. They are also of instructive use in teaching. Their derivation, though, can be tedious, as it involves unfolding a lot of definitions, then hoping to be able to simplify the resulting logical formula to something nice and short. Even in a mechanised generator it is not easy to get the right heuristics in place to achieve good outcomes. Dinaturality is a categorical abstraction that captures many instances of free theorems. Arguably, its origins are more conceptually involved to explain, though, and generating useful statements from it also has its pitfalls. We present a simple approach for obtaining dinaturality-related free theorems from the standard formulation of relational parametricity in a rather direct way. It is conceptually appealing and easy to control and implement, as the provided Haskell code shows.

Authors

• 2 publications
05/10/2020

We show that noninterference and transparency, the key soundness theorem...
06/24/2021

Consistent ultrafinitist logic

Ultrafinitism postulates that we can only compute on relatively short ob...
01/23/2021

Calculating a backtracking algorithm: an exercise in monadic program derivation

Equational reasoning is among the most important tools that functional p...
07/15/2020

Preservation Theorems Through the Lens of Topology

In this paper, we introduce a family of topological spaces that captures...
09/06/2019

Incompleteness Ex Machina

In this essay we'll prove Gödel's incompleteness theorems twice. First, ...
07/12/2018

Gradual Parametricity, Revisited (with Appendix)

Bringing the benefits of gradual typing to a language with parametric po...
12/06/2017

Arrangements of Pseudocircles: On Circularizability

An arrangement of pseudocircles is a collection of simple closed curves ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Free theorems [14] are an attractive means of reasoning about programs in a polymorphically typed language, predominantly used in a pure functional setting, but also available to functional-logic programmers [10]. They have been employed for compiler optimisations [7] and other applications, and can also be used (when generated for deliberately arbitrary polymorphic types) to provide insight into the declarative nature of types and semantics of programs while teaching. Free theorems are derived from relational parametricity [12], and the actual process of deriving them can be tedious. We discuss an approach that side-steps the need to explicitly unfold definitions of relational actions and subsequently manipulate higher-order logic formulae. That there is a relationship between relational parametricity and categorical dinaturality is not news at all [5], and has been used to impressive effect lately [8], but we show that one can do without explicitly involving any category theory concepts, instead discovering all we need along the way. Together with deterministic simplification rules, we obtain a compact and predictable free theorems generator. We provide a neat implementation using the higher-order abstract syntax [11] and normalisation by evaluation [6] principles.

In the remainder of the paper, we are going to explain and discuss the standard approach of deriving free theorems via relational parametricity, first very informally (Section 2), then by somewhat superficially invoking its usual formal presentation (Section 3.1), after which we “discover” our bridge to the simpler approach (Sections 3.2 and 3.3), and conclude with pragmatics and implementation (rest of Section 3 and Section 4).

2 How free theorems are usually derived

For the sake of simplicity, we consider only the case of declarations polymorphic in exactly one type variable, i.e., types like but not like . Extension to cases like the latter would be possible.

2.1 Constructing relations

The key to deriving free theorems is to interpret types as relations [12, 14]. For example, given the type signature , we replace the type variable by a relation variable , thus obtaining . Eventually, we will allow (nearly) arbitrary relations between closed types and , denoted , as interpretations for relation variables. Also, there is a systematic way of reading expressions over relations as relations themselves. In particular,

• base types like and are read as identity relations,

• for relations and , we have

 R1→R2={(f,g)∣∀(a,b)∈R1. (fa,gb)∈R2}

and

• every type constructor is read as an appropriate construction on relations; for example, the list type constructor maps every relation to the relation defined by (the least fixpoint of)

while the type constructor maps to defined by

and similarly for other datatypes.

The central statement of relational parametricity now is that for every choice of , , and , the instantiations of the polymorphic to types and are related by the relational interpretation of ’s type. For the above example, this means that . From now on, type subscripts will often be omitted since they can be easily inferred.

2.2 Unfolding definitions

To continue with the derivation of a free theorem in the standard way, one has to unfold the definitions of the various actions on relations described above. For the example:

Now it is useful to specialise the relation to the “graph” of a function , i.e., setting , and to realise that then and , so that we can continue as follows:

It remains to find out what means. We can do so as follows:

Finally, we obtain, for every , , , and ,

 fmapg(f(b∘g)c)=fb(mapgc)

or, if we prefer this statement pointfree as well, . The power of such statements is that is only restricted by its type – its behaviour can vary considerably within these confines, and still results obtained as free theorems will be guaranteed to hold.

2.3 Typical complications

So what is there not to like about the above procedure? First of all, always unfolding the definitions of the relational actions – specifically, the definition – is tedious, though mechanical. It typically brings us to something like or above. Then, specifically if our has a higher-order type, we will have to deal with preconditions like or . Here we have seen, again by unfolding definitions, that the latter is equivalent to , which enabled simplification of statement by eliminating the variable completely. But in general this can become arbitrarily complicated. If, for example, our of interest had the type , we would have to deal with a precondition instead. By similar steps as above, one can show that this is equivalent to or or something even more cryptic if one insists on complete pointfreeness (to express the condition in the form “” in order to eliminate the explicit precondition by inlining). One might prepare and keep in mind the simplifications of some common cases like those above, but in general, since the type of , and thus of course also the types of higher-order arguments it may have, can be arbitrary and more “exotic” than above (in particular, possibly involving further nesting of function arrows – consider, e.g., we had started with as the target type), we are eventually down to unfolding the definitions of relational actions. We can only hope then to ultimately be able to also fold back into some compact form of precondition like was the case above.

Moreover, the picture is complicated by the fact that the procedure, exactly as described so far, applies only to the most basic language setting, namely a functional language in which there are no undefined values and all functions are total. As soon as we consider a more realistic or interesting setting, some changes become required. Typically that involves restricting the choice of relations over which one can quantify, but also changes to the relational actions that may or may not have a larger impact on the procedure of deriving free theorems. Specifically, already when taking possible undefinedness and partiality of functions into account, one may only use relations that are strict (i.e., ) and additionally has to use versions of datatype liftings that relate partial structures (e.g., ). This is not very severe yet, since strictness of relations simply translates into strictness of functions and connections like for remain intact, so there is no considerable impact on the derivation steps. But if one additionally takes Haskell’s -primitive into account, more changes become required [9]. Now relations must also be total (i.e., implies ) and additionally the relational action for function types must be changed to

The latter does have an impact on the derivation steps, since these typically (like in the examples above) use the definition of a lot, and now must manage the extra conditions concerning undefinedness. Also, some simplifications become invalid in this setting. Note that in the first example above we used that the precondition is equivalent to . But not in a language including , since in such a language eta-reduction is not generally valid (e.g., but not )! We might still be safe, since the condition is at least implied by , so depending on where that explicitly quantifying statement appeared in the overall statement we may obtain a weakening or a strengthening of that overall statement by replacing one condition by the other. But such considerations require careful management of the preconditions and their positions in nested implication statements. All this can still be done automatically [1], but it is no pleasure. There is not as much reuse as one might want, different simplification heuristics have to be used for different language settings, there is no really deterministic algorithm but instead some search involved, and sometimes the only “simplification” that seems to work is to unfold all definitions and leave it at that. Moreover, if one were to move on and consider automatic generation of free theorems for further language settings, like imprecise error semantics [13], then the story would repeat itself. There would be yet another set of changes to the basic definitions for relations and relational actions, new things to take care of during simplification of candidate free theorems, etc.

2.4 Some problematic examples, and outlook at a remedy

Let us substantiate the above observations with some additional examples. First we consider the declaration . Our existing free theorems generator library mentioned above [1] produces the statement that for every , , and , it holds:111The new web UI for the library created by Joachim Breitner [2] actually appears to not apply all possible simplifications, so the statement remains even a bit more complicated there.

Arguably, it would have been more useful to be given the equivalent statement that for every , , with types as above,

 g(fp)=f(λs→g(p(λx→s(mapgx)))) (1)

There is another free theorems generator as part of another tool, by Andrew Bromage [3], and it does quite okay here, generating this: . But if we make the input type a bit more nasty by more nesting of function arrows, , then the existing generators differ only slightly from each other, and both yield something like the following:

It would have been nicer to be given the following:

 g(fp)=f(λs→g(p(λt→s(λw→t(λx→w(mapgx)))))) (2)

which is exactly what the approach to be presented here will yield (modulo variable names). Of course, one could invest into further post-processing steps in the existing generators to get from the scary form of the statement to the more readable, equivalent one. But at some point, this will always be only partially successful. Going from a compact relational expression to a quantifier-rich formula in higher-order logic through unfolding of definitions, and then trying to recover a more readable form via generic HOL formula manipulations, will generally be beaten by an approach better exploiting the structure present in the original type expression – which is what we will do. We will always generate a simple equation between two lambda-expressions, without precondition statements, as in (1) and (2) above.

Moreover, there is still the issue of the variability of free theorems between different language settings. The generator inside Lambdabot does not consider such impact of language features, and thus the theorems it outputs are not safe in the presence of . Our own previous generator does, and thus adds the proper extra conditions concerning undefinedness. For example, for the more complicated of the two types considered above, the output then is (besides a strictness and totality condition imposed on )222Something we will not mention again and again is that is also itself non-. Disregarding types that contain only , this follows from totality of anyway.:

In contrast, with the approach to be presented we will get:

which …

1. …is almost as strong as the more complicated formula above it. The only thing that makes it weaker is that it does not express that the corner cases and with any of , , , …, also hold.

2. …simply reduces to (2) in any functional language setting in which eta-reduction is valid. So we will not perform different derivations for different language settings. (Rather, eta-reduction, when applicable, can be applied as an afterthought – which is exactly what our implementation will do.)

To top the mentioned benefits, the approach to free theorems derivation we will discuss is much simpler than the previous one – simpler both conceptually (and thus also when one wants to obtain free theorems by hand) as well as when implementing it. In fact, the generator code takes up only half a page in the appendix (without counting the code for implementing the eta-reduction functionality) – a small fraction of the size of the corresponding code in the existing free theorems generators.333Additional code for parsing input strings into type expressions and pretty-printing generated theorem expressions back into pleasingly looking strings is of comparable complexity between the different generators.

There is one gotcha. It is not always possible to express a free theorem simply as an equation without preconditions. A typical example is the type . Its general free theorem is:

Since even for fixed , neither of and uniquely determines the other here, the precondition cannot be avoided by some way of inlining or other strategy. The dinaturality-based approach will instead generate the unconditional statement

i.e., setting to and to for some , thus certainly satisfying , but losing some generality. However, we believe we can say for what sort of types this will happen (see Section 3.5).

3 Free theorems simply, via dinaturality

So, what is the magic sauce we are going to use? We start from the simple observation that with the standard approach, once one has done the unfolding of definitions and subsequent simplifications/compactifications, one usually ends up with an equation (possibly with preconditions) between two expressions that look somewhat similar to each other. For example, for type one gets the equation , for type one gets the equation , etc. There is certainly some regularity present: on one side happens “before ”, on the other side it happens “after ”; maybe needs to be brought in at some other place in one or both of the two sides as well; but the expression structure is essentially the same on both sides. In fact, given some experience with free theorems, one is often able to guess up front what the equation for a given type of will look like. But at present, to confirm it, one is still forced to do the chore of unfolding the definitions of the relational actions, then massaging the resulting formulae to hopefully bring them into the form one was expecting. We will change that, by using what we call here the conjuring lemma of parametricity. It was previously stated for a functional-logic language to simplify derivation of free theorems in that setting [10] (Theorem 7.8 there), but will be used for (sublanguages of) Haskell here. To justify it, we need a brief excursion (some readers may want to largely skip) into how relational parametricity is usually formulated.

3.1 Usual formulation of relational parametricity

Putting aside notational variations, as well as the fact that the exact form would differ a bit depending on whether one bases one’s formalisation on a denotational or on an operational semantics (typically of a polymorphic lambda-calculus with some extensions, not full Haskell), one essentially always has the following theorem (sometimes called just the fundamental lemma of logical relations). Some explanations, such as what stands for, are given below it.

Theorem 3.1 (Relational Parametricity)
1. If is a closed term (containing no free term variables, but also no free type variables) of a closed type , then .

2. If is a closed term (in the sense of containing no free term variables) of a type polymorphic in one type variable, say containing free type variable , then for every choice of closed types , , and , we have .

3. If is a polymorphic term as above, of type containing free type variable , but now possibly also containing a free term variable of some type possibly containing the free type variable as well, then for every choice of closed types , , and as above, and closed terms and such that , we have .

Now, the promised explanations:

• The notation corresponds to the construction of relations from types (as in Section 2.1), where keeps track of the interpretation of any type variables by chosen relations. For example, would be and would be .

• For any closed type , the relation (in fact, any ) turns out to just be the identity relation at type . As such, in the first item of the theorem may appear to state a triviality. However, if one explicitly handles abstraction and instantiation of type variables (we have not done so for the exposition in Section 2, because we anyway wanted to deal only with types polymorphic over exactly one type variable), then it is less so. One then introduces, alongside etc., a new relational action (for mappings on relations), which is defined in exactly such a way that when moreover setting , the statement reduces exactly to the statement in the second item of the theorem – which then needs not to be explicitly made. The treatment is analogous if one has types polymorphic in more than one type variable, say , which explains how to deal with that case not considered in Section 2.

• The choices of relations are not really completely arbitrary, instead depend on the language setting for which the parametricity theorem is stated and proved. As mentioned earlier, must be strict to take the presence of partial functions into account, and must be strict and total to take the presence of into account, and other restrictions may apply in other settings.

• Even the third item of the theorem as stated above, adding the treatment of free term variables, is not yet the most general form. In general, the parametricity theorem is formulated for an arbitrary number of free type and term variables, in straightforward (but notationally tedious) extension of the formulations above. Just for the sake of exposition here, we have chosen the progression between the three items. Of course, usually not all three (or more/further ones) are shown, only one at the level of generality needed for a specific concern. In a short while, we will see that it can even be useful to consider the case where does involve a type variable, and free term variables of types involving that type variable, but does itself not have a polymorphic type.

Also, let us make explicit how Theorem 3.1 corresponds to the standard derivation approach for free theorems as described in Section 2. Given a function of type scheme polymorphic in , one would use the first or second item of the theorem to conclude , then unfold the definition of , for example if , then continue from there, with all the tedious work this entails.

The trick now is to establish a lemma, actually a corollary, that does not even mention the relation construction , and that directly states an equality between expressions rather than something about relatedness.

3.2 The conjuring lemma of parametricity

Before giving the lemma, let us give a brief example of the sort of term that can appear in it, since without such an example it may be counterintuitive how could “involve ” but nevertheless have a closed overall type. What this means is that can be something like . In a context in which and and are term variables typed and respectively, this has the closed type , despite the fact that in order to write down with explicit type annotations everywhere (i.e., on all subexpressions), one would also need to write down the type variable at some places. Now the lemma, a corollary of the parametricity theorem.

Lemma 1 (Conjuring Lemma)

Let , and be closed types. Let be closed and:

• strict if we want to respect partially defined functions,

• strict and total if we want to respect .

Let be a term possibly involving (but not in its own overall type, which is closed by assumption) and term variables and , but no other free variables. Then:

 e[τ1/α,idτ1/pre,g/post]=e[τ2/α,g/pre,idτ2/post]
Proof

The conditions on (strictness/totality depending on language setting) guarantee that its graph can be used as an admissible . To apply the parametricity theorem (in its general form with arbitrarily many free variables), we need to establish and . Since are closed types, these statements reduce to and , respectively. Both of these hold in all the language settings considered (easy calculations; also note that if total), so the parametricity theorem lets us conclude

 (e[τ1/α,idτ1/pre,g/post],e[τ2/α,g/pre,idτ2/post])∈Δ[α↦R],τ

from which the lemma’s statement follows by (recall: is closed).

Let us reflect on what we have gained. The conjuring lemma does not mention from the previous subsection. It holds in basically any language setting in which the (or better, a) parametricity theorem holds, no matter what the exact definitions of the relational actions (the unfolding steps employed for a concrete ) are. It is enough that a) the statement of the parametricity theorem holds in the language setting under consideration, and that b) and do hold. Both a) and b) are the case in all languages and -definitions we are aware of. This does not just mean partiality and in Haskell, but also for example the setting with imprecise error semantics as studied in [13]. Even in work on parametricity and free theorems for a functional-logic language [10], where the definition of , including the case , turns out somewhat differently (since having to deal with nondeterminism and thus with power domain types), the statement of the parametricity theorem and the definition of are such that the conjuring lemma holds. Of course, whether the in the lemma must be strict, or strict and total, or something else, does depend on the language setting, but this is not harmful, since it does not restrict us in our choice of .

Also, suppose the situation that some new datatype is to be considered. Usually, this requires some new lifting to be defined and used for the relational interpretation of types. Even though there is a standard recipe to follow, at least for run-of-the-mill algebraic datatypes, it is still work, and requires checking and of course building into a free theorems generator, along with appropriate simplification rules. Not so if we use the conjuring lemma, which (while of course requiring an assertion that the parametricity theorem still holds even in the presence of the new datatype – i.e., there must exist an appropriate relational lifting) is not itself sensitive at all to how the new datatype is relationally interpreted. If we can come up with interesting terms , now possibly involving the new datatype, we are in good condition to prove new free theorems.

Before we consider the question whether we actually can, in general, come up with interesting terms , let us do so for some specific examples. We have already remarked, just before the conjuring lemma, that given , the term fits the bill, which means that the conjuring lemma gives us the following statement:

Using the additional knowledge that , this is exactly the standard free theorem for said type of , namely .

Let us try again, for the type . We may “know” that we want , but do not want to prove that statement with a lengthy derivation. Imagining where and should be put in order to make both sides of the desired statement an instance of a common term , we may arrive at , from which the conjuring lemma plus rewriting gives us

Should we also have rewritten to ? No, not in general! In fact, is not valid in the presence of , and luckily there is no way to abuse the conjuring lemma for producing the not generally valid statement . Only after applying the lemma, when we commit to a specific language setting, we may decide that for us indeed holds.

To conclude this example exploration, let us consider the nasty type from Section 2.4. The choice gives us what we reported there as (2). We also remarked there that the approach presented here does not give us the various positive corner cases relevant in the presence of . That is not fully true; actually the conjuring lemma gives us those as well, for example with , which is a valid input to Lemma 1, and gives us . But in what follows, we want to construct exactly one for each type of , and of course we opt for the supposedly most useful one, not for corner cases that “just” happen to also be valid. So for said type of , we want to, and will, construct the which gives

(or with left-hand side in a world in which eta-reduction is valid).

3.3 Constructing e – discovering dinaturality

Given some of polymorphic type, we want to construct an of closed type. That seems easy, we could simply use . But no, of course we want to use in an interesting way. In essence, we want it to “touch” each occurrence of the type variable in the type of . For doing so, can use and . Some reflection shows that we should make a difference between positive and negative occurrences of , in the standard sense of polarity in function types. That is, an occurrence of

that is reached by an odd number of left-branching at function arrows (in the standard right-associative reading of

) is considered a negative occurrence, others are considered positive occurrences. So, for example, in the type , the first is positive, the second one is negative, and the third one is positive. Then, we want to construct such that negative occurrences of are replaced by and positive ones by .

This is doable by structural recursion on type expressions. Specifically, the following function builds a term that maps an input of type to an output of a type with the same structure as , but made monomorphic according to the just described rule about negative and positive occurrences of . So, for example, maps an input of type to an output of type . We do not prove the general behaviour, but it should be easy to see that does what we claim. The defining equations we give should also be suggestive of what would have to be done if new datatypes (other than lists and ) are introduced.

Note the switching of and in moving from to . Of course, in that last defining equation, the must be a sufficiently fresh variable (also relative to and ).

Given of polymorphic type , we will be able to use the term in Lemma 1. It is useful to notice then that (omitting explicit type instantiation and substitution):

So our overall procedure now is to generate free theorems as follows:

 monoid,g(σ)f=monog,id(σ)f

Category theory aficionados will recognise the concept of dinaturality here!

Let us try out the above for the example . We get:

So the free theorem we get from this by instantiation is:

There are ample opportunities for further simplification here, but let us try to be systematic about this.

3.4 Simplifying obtained statements

The terms generated by contain a lot of function compositions, both outside and inside of - and -calls. Moreover, many of the partners in those compositions will be , either up front because of the cases with a base type, or later when or is replaced by (and the other by ). So our primary strategy for simplification is to inline all the compositions, and while doing so eliminate all -calls. Additionally, all lambda-abstractions introduced by the case will be provided with an argument and then beta-reduced. There is no danger of term duplication here since the lambda-bound is used only linearly in the right-hand side.

These considerations lead to the following syntactic simplification rules, to be applied to terms produced as or . As usual, where new lambda-bound variables are introduced, they are assumed to be sufficiently fresh.

The last line is a catch-all case that is only used if none of the others apply. In the case where the simplification function is applied to a term of the form , note that we are indeed entitled to eta-expand the beta-reduced version into (in order to subsequently apply recursively). Said eta-expansion is type correct as well as semantically correct, since by analysing the function, which is the producer of the subexpression , we know that , and hence also , is a term formed by function composition, and since is a valid equivalence even in language settings in which eta-reduction is not valid (and in which thus would not be in general okay). The eta-expansions on the function arguments of - and -calls (again done to enable further simplification on ) are also justified, since and use their function arguments only in specific, known ways: is indeed semantically equivalent to , since does not use . These considerations should convince us that transforms a term into a semantically equivalent one, hence is correct. But is it also exhaustive, or can we accidentally skip transforming (and thus, simplifying) some part of the term produced by ? The best argument that we cannot, actually comes from the Haskell implementation given in the appendix, and will be discussed in Section 4.

Let us be a bit more concrete again, and consider an example. In the previous subsection, we generated for . Let us now calculate from this (of course, would be very similar). See Fig. 1. The result is not yet fully satisfactory. For one thing, was “simplified” to . Of course, we would prefer it to be simplified to just . This is easy to achieve by adding simplification rules like . In fact, our implementation does something more general, namely replacing the original first simplification rule by the following one: whenever can be syntactically generated by the grammar