Elaborating Inductive Datatypes and Course-of-Values Pattern Matching to Cedille

In CDLE, a pure Curry-style type theory, it is possible to generically derive induction for λ-encoded data whose types are least fixed points of functors. From this comes fruitful research into the design-space for datatype systems: FDJS18_CoV-Ind build upon this derivation to enhance datatypes with course-of-values (CoV) induction, and DFS18_GenZC-Reuse show how to achieve program, proof, and data reuse generically and at zero run-time cost between non-indexed and indexed variants of datatypes. However, experience shows programmers and type theorists prefer the convenience of built-in support for declaring datatypes and defining functions over them by case analysis and fixpoint-style recursion. Use of the above generic framework still comes with some difficulties in this regard, in particular requiring manual derivation of case analysis for functors, using the less-familiar generic fixpoint induction, and working with λ-encodings directly. In this paper, we present a datatype subsystem for Cedille (an implementation of CDLE) that addresses the above concerns while preserving the desirable features derivable of λ-encoded datatypes within CDLE. In particular, we describe a semantic termination checker based on CoV pattern matching and an extension to definitional equality for constructors that supports zero-cost reuse for datatypes, giving detailed examples of both. Additionally, we demonstrate that this does not require extending Cedille's core theory by showing how datatypes and functions over them elaborate to CDLE and proving that this elaboration is sound. Both the datatype subsystem and its elaborator are implemented for Cedille, to be officially released in version 1.1.0

Authors

• 6 publications
• 1 publication
• 12 publications
• Efficient Mendler-Style Lambda-Encodings in Cedille

It is common to model inductive datatypes as least fixed points of funct...
03/06/2018 ∙ by Denis Firsov, et al. ∙ 0

• Generic Zero-Cost Reuse for Dependent Types

Dependently typed languages are well known for having a problem with cod...
03/21/2018 ∙ by Larry Diehl, et al. ∙ 0

• Efficient lambda encodings for Mendler-style coinductive types in Cedille

In the calculus of dependent lambda eliminations (CDLE), it is possible ...
05/01/2020 ∙ by Christopher Jenkins, et al. ∙ 0

• Deriving monadic quicksort (Declarative Pearl)

To demonstrate derivation of monadic programs, we present a specificatio...
01/27/2021 ∙ by Shin-Cheng Mu, et al. ∙ 0

• Course-of-Value Induction in Cedille

In the categorical setting, histomorphisms model a course-of-value recur...
11/29/2018 ∙ by Denis Firsov, et al. ∙ 0

• A Role for Dependent Types in Haskell (Extended version)

Modern Haskell supports zero-cost coercions, a mechanism where types tha...
05/31/2019 ∙ by Stephanie Weirich, et al. ∙ 0

• Structural and semantic pattern matching analysis in Haskell

Haskell functions are defined as a series of clauses consisting of patte...
09/09/2019 ∙ by Pavel Kalvoda, et al. ∙ 0

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Since their debut in the HOPE programming language(Burstall et al., 1980), algebraic datatypes (ADTs) have become a popular feature in modern functional languages such as Haskell, ML, Racket, and Scala (to name only a few!). ADTs combine a concise scheme for introducing a variety of useful sum-of-products recursive types generated by data constructors with the intuitive mechanism of defining functions over these by pattern matching (McBride, 1970), whereby the programmer provides a set of defining equations for each constructor, and fixpoint-style recursion.

For the same reasons, ADTs have seen great success in type theory as well. Many implementations of type theories ((Norell, 2007; INRIA, 2017; Brady, 2013)) automatically equip such types with an induction principle invoked via pattern matching and recursion(Coquand, 1992a), rather than by use of the comparatively primitive datatype eliminators, allowing proofs to be written like programs in the above languages. This success is somewhat tempered by concerns of termination checking: definitions using general recursion are idiomatic in functional languages, but implementations of logically sound type theories must usually ensure recursive functions and proofs are well-founded.

As doing this for arbitrary functions using general recursion is undecidable, some conservative approximation for termination must be used. The most common approach is a separate syntactic check (Giménez, 1995) to ensure recursion occurs only on subdata revealed by pattern matching. However, in practice the syntax-based approach can be brittle and unable to conclude that some intuitive definitions are terminating. Alternatively, a semantic approach to termination checking augments the type system itself with some notion of the size of datatypes, increasing the set of definitions that are recognized as terminating. Examples of this approach include (Abel, 2010; Ahn, 2014; Barthe et al., 2004). Common to these and other proposal is augmentation of the underlying type theory to support more complex induction schemes.

In contrast to the above, in pure typed λ-calculi datatypes are defined by λ-encodings combining case analysis and recursion into a single scheme (similar to datatype eliminators) that ensures termination. Historically, λ-encodings have languished due to several difficulties. First is their computational inefficiency: (Parigot, 1990) showed that the well-known Church encoding of natural numbers has no better than a linear-time predecessor. Second is their seeming lack of expressive power: (Geuvers, 2001) showed it is impossible to derive an induction principle in second-order dependent type theory. Finally, programmers (and type theorists!) find working directly with λ-encodings cumbersome compared built-in support for datatypes.

Cedille is a dependently typed programming language that seeks to address the above difficulties for λ-encodings(Stump, 2017, 2018b). Cedille’s core theory, the Calculus of Dependent Lambda Eliminations (CDLE), is a compact pure Curry-style type theory with no primitive notion of inductive datatypes; instead, (shown by (Firsov and Stump, 2018; Firsov et al., 2018a)) it is possible to generically derive the induction principle for λ-encoded data with a Mendler-style of encoding that features constant-time predecessors and a linear-space representation. Furthermore, these encoded datatypes have properties desirable for programmers and type theorists alike: (Diehl et al., 2018) showed how to achieve reuse for programs, proofs, and data generically and at zero run-time cost, and (Firsov et al., 2018b) showed how to extend λ-encodings with course-of-values (CoV) induction which gives access to an inductive hypothesis at arbitrarily deeply nested subdata.

Contributions

Though this framework for generically deriving induction in Cedille saves programmers from a great deal of tedium when defining datatypes and functions over them, there are still some difficulties in using it. In particular, it still requires of users that they:

• manually prove that a given datatype signature functor is positive;

• manually derive functor induction and combine it with less familiar generic induction; and

• work directly with λ-encoded data, particularly unfortunate when trying to prove properties of it as term reduction in types may reveal the underlying encoding.

In this paper, we address the above usability concerns by presenting a datatype subsystem for Cedille with convenient notation for declaring inductive types and functions defined over them using pattern matching and fixpoint-style induction. In doing so, we:

• design a semantic (type-based) termination checker based on CoV pattern matching, a novel concept allowing Cedille to accept definitions expressed as CoV induction schemes and programmers to reuse code for ordinary and CoV induction (Section 3);

• solve a technical challenge to bridge the gap between the facilities of the generic framework and CoV pattern matching in the surface language (Section 5.2);

• support zero-cost reuse by extending definitional equality for constructors to internalize when their λ-encodings would be equal in CDLE, and give a novel example of simulating a simultaneous datatype and function definition using this extension (Section 4);

• show how datatype declarations and functions over them are elaborated to λ-encodings in Cedille (Sections 6.1, 7); and

• prove the soundness of our datatype elaborator (i.e, that elaboration is type- and value-preserving), demonstrating that the above can be achieved without extension of the core theory of Cedille (Section 7).

The proposed datatype system and its elaborator are implemented for Cedille, to be officially released in version 1.1.0. Our implementation automatically replays the generic framework of (Firsov et al., 2018a, b) for each datatype declaration tuned to its indices to appropriately handle types with large indices (i.e. type-indexed types), not expressible within the framework as given, and elaborates this and all definitions to Cedille Core, a minimal specification of CDLE implemented in  1K Haskell LoC. This paper and accompanying proof appendix treats formally only the elaboration to Cedille without datatypes (which maps straightforwardly to Cedille Core) of non-indexed datatypes.

The remainder of this paper is organized as follows: in Section 2 we briefly review CDLE and describe Cedille’s datatype system using standard datatypes and functions and show their elaborations; Section 3 explains CoV pattern matching; Section 4 shows the preservation of zero-cost reuse in our datatype system and a novel example of simulating a simultaneous definition; Section 5 describes elaborator interface and solution to the above-mentioned technical challenge; Section 6 gives formal treatment of the elaboration of datatype declarations; Section 7 explains elaboration of induction and CoV pattern matching; finally, Section 8 discusses related and future work.

2. Programming with Datatypes in Cedille

2.1. Background: CDLE

We first review CDLE, the type theory of Cedille; a more complete treatment can be found in (Stump, 2018b). CDLE is an extension of the impredicative, Curry-style (i.e. extrinsically typed) Calculus of Constructions (CC) that adds three new typing constructs: equality of untyped terms (); the dependent intersection type () of (Kopylov, 2003); and the implicit (erased) product type () of (Miquel, 2001). The pure term language of CDLE is just that of the untyped -calculus; to make type checking algorithmic, terms in Cedille are given type annotations, and definitional equality of terms is modulo erasure of these annotations. The kinding, typing, and erasure rules for the fragment of CLDE containing these type constructs are given in Figure 1. We briefly describe these below:

• is the type of proofs that and are equal (modulo erasure). It is introduced with (erasing to ), proving for any untyped term . Combined with definitional equality, can be used to prove for any -convertible and whose free variables are declared in the typing context. Equality types can be eliminated with , , and .

• (erasing to ) rewrites a type by an equality: if proves that and has type , then the ρ expression has type , with the guide indicating the occurrences of rewritten in the type of .

• (erasing to ) casts to the type of when proves and equal.

• (erasing to ) has type when proves that Church-encoded true equals false, enabling a form of proof by contradiction. While this is adequate for CDLE, Cedille makes δ more practical by implementing the Böhm-out algorithm(Böhm, 1968) so δ can be used on any proof that for closed, normalizing, and -inconvertible terms and .

• is the type of terms which can be assigned both type and , and in the annotated language is introduced by , where has type , has type , and . Dependent intersections are eliminated with projections and , selecting resp. the view that term has type or

• is the implicit product type, the type of functions with an erased argument of type and a result of type . Implicit products are introduced with , provided does not occur in , and are eliminated with erased application . Due to the restriction that bound variable cannot occur in the body of , erased arguments play no computational role and thus exist solely for the purposes of typing.

Figure 1 omits the typing and erasure rules for the more familiar term and type constructs of CC. When reasoning about definitional equality of term constructs in CC, all types in type annotations, quantifications, and applications are erased. Types are quantified over with ∀ within types and abstracted over with Λ in terms, similar to implicit products; the application of a term to type is written , and similarly application of type to type is written . In term-to-term applications, we omit type arguments when these are inferable from the types of term arguments.

2.2. Standard Examples for Datatypes and their Elaborations

Declarations

Figure 2 shows the definitions in Cedille for some well-known types. Modulo differences in syntax, the general scheme for declaring datatypes in Cedille should be straightforward to anyone familiar with GADTs in Haskell or with dependently typed languages like Agda, Coq, or Idris. Some differences from these languages to note are that:

• In constructor types, recursive occurrences of the inductive datatype (such as Nat in must be positive, but need not be strictly positive.222The ability to do this soundly for “small” datatypes is well-known, see for example (Blanqui, 2005).

• Occurrences of the inductive type being defined are not written applied to its parameters. E.g, the constructor nil is written with signature List rather than . Used outside of the datatype declaration, nil has its usual type: .

• Declarations can only refer to the datatype itself and prior definitions. Inductive-recursive and inductive-inductive definitions are not part of this proposal.

Elaborations

Finally, and most essentially, declarations of datatypes and their constructors in Cedille are elaborated to λ-encodings in CDLE for which induction has been derived. This is described more completely in 6.1, but to emphasize this point we provide here a sketch of 1) the elaborations for the declaration of Nat and 2) some properties that hold for them.

In Figure 3 Nat is a variant of the usual impredicative encoding of Nat’s signature functor (with occurring in place of recursive occurrence of Nat) that uses dependent intersection to enable functor induction (similar to (Stump, 2018a)). posNat is the elaborated proof that Nat is positive, with IdMapping a generalization of the notion of a covariant functor (Figure 13(a)). This proof is given to the type-level function Fix (Figure 13(b)) in the definition of the least fixpoint Nat of the functor Nat, and elaborated constructors zero and suc are defined in terms Nat. The two equalities show the elaborated constructors are (after erasure) pure λ-terms. We provide some intuition for zero: is some function by cases over Nat that expects an argument to make recursive calls with () and a Nat argument (here the more usual encoding of 0). Finally, we list the statement of induction for Nat derivable in CDLE. This and the fact Fix is a derived (rather than primitive) fix-point function for functors, means that declarations of inductive datatypes require no extension to Cedille’s core theory.

Throughout this paper, we use the following naming convention for expressions elaborated from datatypes and their constructors: for the usual impredicative encoding of a signature functor; for the intersected version supporting functor induction; and for the least fixpoint of the inductive functor.

Standard Functions

Figure 4 shows a few standard examples of functional and dependently-typed programming in Cedille. Function pred introduces operator μ’ for CoV pattern matching. Here it is used for standard pattern matching: μ’ is given scrutinee n of type Nat and a sequence of case branches for each constructor of Nat. Functions add and vappend introduce operator μ for CoV induction by combined pattern matching and recursion; the distinction between pattern matching by μ and μ’ is made clear in Section 3. Here, μ is used for standard structurally recursive definitions, with vappend showing its use on indexed type Vec to define recursive function vappendYs, semantically appending to its argument. In the vnil branch, the expected type is by the usual index refinement of pattern matching on indexed types; thanks to the reduction behavior of add this is convertible with , the type of ys. Similarly, in the vcons branch the expected type is , convertible with the type of the body.

Elaborations

As with datatype declarations, μ and μ’ expressions in Cedille are also elaborated to pure λ-expressions whose types correspond to the type of the original expression. Figure 5 gives a sketch of this for pred and add, showing the elaborated types and (erased) definitions of these functions. A more complete treatment can be found in Section 7, but we provide some intuition for the definition of add. After abstracting over the two arguments, it provides with a function taking addN (for recursive calls) and (the unrolling of ), and give the two case branches to .

Erasure and Reduction of μ and μ’

Part of the difficulty of working directly with λ-encodings in Cedille is their unwieldy size. For example, in the elaboration of add in Figure 5, replacing suc with its definition results in a term for which proving even simple properties can tedious. To address this, Cedille treats the constructs μ and μ’ as if they were primitive, giving them their own erasure, reduction, and typing rules. The first two of these are given in Figure 6; Figures 9 and 20 in coming sections give resp. simplified and complete type inference rules, and in Section 3 we explain the fully-annotated form of case analysis . Reduction for μ’ is the usual branch selection for case analysis; for μ it is a combination of this and fixpoint unrolling. We state the soundness (value-preservation) of these reduction rules in Section 7.

In Figure 6, a metavariable denotes a datatype constructor, a sequence of type and (mixed-erasure) term arguments, a sequence of type and (mixed-erasure) term variables bound by pattern guards, the length of , a collection of branches guarded by patterns with bodies , the erasure of type and erased-term variables in the sequence , and the simultaneous and capture-avoiding substitution of terms and types for variables .

3. Course-of-Values Pattern Matching

This section explains course-of-values (CoV) pattern matching, a novel contribution that is the basis for Cedille’s type-based termination checker and that facilitates reuse for functions used in ordinary and CoV induction. Similar code examples appear in (Firsov et al., 2018b) using their generic framework. In the following, these examples illustrate CoV induction in our surface language and motivate a technical challenge solved implementing CoV pattern matching using the framework (Section 5.2).

The definitions of add and vappend in Figure 4 only require structural recursion, which is what the usual datatype eliminators of type theory are able to express. In general purpose functional languages, programmers are free to define functions using much more powerful recursion schemes (including general recursion). Users of type theory implementations are usually not afforded such freedom, as most of these must ensure recursive definitions are well-founded or risk logical unsoundness. A common approach is to use a termination checker implementing a syntactic guard enforcing that recursive calls are made on terms revealed by pattern matching on arguments, guaranteeing they are smaller. This extends to allowing recursion using nested case analysis, more expressive than the immediate structural recursion of usual eliminators, which to achieve this require a workaround like “tupling” of the argument with its predecessor. Below, fib shows recursion with nested pattern matching in Cedille without such a workaround:

fib: Nat ➔ Nat = λ n. μ fib. n {
| zero ➔ suc zero
| suc n’ ➔ μ’ n’. {| zero ➔ suc zero | suc n’ ➔ add (fib n’) (fib n’’)}
}.
Unfortunately, syntactic termination checkers are usually unable to determine that classes of even more complex recursion schemes are well-founded. Consider an intuitive definition of division on natural numbers by iterated subtraction. In a Haskell-like language, programmers simply write:
0 / d = 0
n / 0 = n
n / d = if (n < d) then zero else 1 + ((n - d) / d)
This definition is guaranteed to terminate for all inputs, as the first argument to the recursive call, , is smaller than the original argument ( is guaranteed to be non-zero). As innocuous as this definition may seem to functional programmers, it poses a difficulty for syntactic termination checkers, as is not an expression produced by (nested) case analysis of within the definition of division but an arbitrary predecessor produced by iterations of case analysis. This class of recursion scheme is called CoV recursion (categorically, histomorphism); it is guaranteed to be terminating, but this fact is difficult to communicate to syntactic termination checkers!

3.1. Course-of-values Recursion

In contrast to the above, Cedille implements semantic, or type-based, termination checking that is powerful enough to accept definitions using CoV recursion (and induction). At its heart is a feature we call CoV pattern matching, invoked by μ’. CoV pattern matching subsumes nested case analysis (as in fib) and can be used to define a version of division written close to the intuitive way, requiring a few more typing annotations to guarantee termination. Termination checking works by replacing, in the types of subdata in pattern guards of inductive μ-expressions (not μ’!), the recursive occurrences of a datatype with an “abstract” type. This abstract type (and not the usual datatype) is the type of legal arguments for recursive calls. Crucially, expressions defined using μ’ are able to preserve this abstract type, meaning users can write versions of e.g. predecessor and subtraction which can be used when defining division; furthermore, they may be reused for ordinary numbers. These and other auxiliary definitions are given in Figure 7.

CoV Globals

We first explain the types and definitions of pred’ and minus’. In pred’ we see the first use of predicate Is/Nat. Every datatype declaration in Cedille introduces, in addition to itself and its constructors, three global names derived from the datatype’s name. For Nat, these are:

• A term of type is a witness that any term of type may be treated as if has type Nat for the purposes of case analysis.

• is the trivial Is/Nat witness.

• to/Nat is a function that coerces a term of type to Nat, given a witness that “is” Nat. In Section 6, we will see that to/Nat and all other such cast functions elaborate to terms definitionally equal (modulo erasure) to . Cedille internalizes this fact: equation is true definitionally in the surface language. Notice that this is possible in part because there is only one unerased argument to to/Nat. This property is important for CoV induction further on.

In pred’ the witness of type is given explicitly to μ’ with the notation , allowing argument (of type ) to be a legal scrutinee for Nat pattern matching. Reasoning by parametricity, the only ways pred’ can produce an output (i.e, preserve the abstract type) are by returning itself or some subdata produced by CoV pattern matching on it – the predecessor also has type . Thus, the type signature of pred’ has the following intuitive reading: it produces a number no larger than its argument, as an expression like would be type-incorrect to return.

Code Reuse

The reader may wonder at this point what the relation is between pred’ and the earlier pred of Figure 4. The fully annotated μ’-expression of the latter is:

μ’<is/Nat> n @(λ x: Nat. Nat) {| zero ➔ n | suc n’ ➔ n’}
In pred, the global witness is/Nat of type need not be passed explicitly, as it is inferable333The same holds for the inferability of the local witness (discussed below) introduced in the body of fib. by the type Nat of the scrutinee . Furthermore, the erasures of pred and pred’ are definitionally equal, a fact provable in Cedille (where _ indicates an anonymous proof):
_ : {pred ≃ pred’} = β.

This leads to a style of programming where, when possible, functions are defined over an abstract type for which e.g. holds, and the usual version of the functions reuse these as a special case. Indeed, this is how minus is defined – in terms of the more general minus’ specialized to the trivial witness is/Nat. The type signature of minus’ yields a similar reading that it produces a result no larger than its first argument. In the successor case, pred’ is invoked and given the (erased) witness is. That minus’ preserves the type of its argument after n uses of pred’ is precisely what allows it to appear in expressions given as arguments to recursive functions. Function minus is used to define lt, the Boolean predicate deciding whether its first argument is less than its second; ite is the usual definition of a conditional expression by case analysis on Bool.

CoV Locals

The last definition, divide, is as expected except for the successor case. Here, we make a let binding (the syntax for which in Cedille is , analogous to ) for , the coercion to Nat of the predecessor of the dividend (using the as-yet unexplained Is/Nat witness isType/divD), and for , the difference (using minus’) between and . Note that when is non-zero, is equal to the different between the dividend and divisor, and otherwise it is equal to ; in both cases, it is smaller than the original pattern . Finally, we test whether the dividend is less than the divisor: if so, return zero, if not, divide by and increment. The only parts of divide requiring further explanation, then, are the witness isType/divD and the type of , which are the keys to CoV recursion and induction in Cedille.

Within the body of the μ-expression defining recursive function divD over scrutinee of type Nat, the following names are automatically bound:

• , the type of recursive occurrences of Nat in the types of variables bound in constructor patterns (such as ).

• , a witness that terms of the recursive-occurrence type may used for further CoV pattern matching.

• , the recursive function being defined, accepting only terms of the recursive occurrence type Type/divD. This restriction guarantees that divD is only called on expressions smaller than the previous argument to recursion.

The reader is now invited to revisit the definitions of Figure 4, keeping in mind that in the μ-expressions of add and vappend constructor subdata and in pattern guards and have abstract types (the subdata of the successor case of the μ’-expression of pred has the usual type Nat), and that recursive definitions addN and vappendYs only accept arguments of such a type. With this understood, so to is the definition divide: predecessor has type Type/divD, witness isType/divD has type Is/Nat ·Type/divD and so the local variable has type Type/divD, exactly as required by divD.

3.2. Course-of-values Induction

CoV recursion is not enough – in a dependently typed language, one also wishes sometimes to prove properties of recursive definitions. Cedille enables this with CoV induction, which was derived generically for λ-encoded datatypes in CDLE by (Firsov et al., 2018b). Figure 8 shows its use in leDiv to prove that the result of division is no larger than its first argument.444The use of suc in the result makes it difficult to give divide a type that guarantees this.

We first encode the relation “less than or equal” as a datatype LE and prove two properties of it (definitions omitted, indicated by <..>): that it is transitive (leTrans) and that minus produces a result less than or equal to its first argument (leMinus). In the proof of leDiv itself, we define a recursive function (also named leDiv) over . When it is zero, the goal becomes , provable by constructor leZ. When it is the successor of some number , the expression in the type of the goal reduces to a conditional branch on whether the dividend is less than the divisor. We use μ’ to match on the result of to determine which branch is reached: if it is true, the goal type reduces further to , which is again provable by leZ; otherwise, the goal is , where is defined as divided by . Here is where CoV induction is used: to define we invoke the inductive hypothesis on , a term that is equal (modulo erasure) to but has the required abstract type Type/leDiv, letting us prove . We combine this and a proof of (bound to ) with the proof that LE is transitive, producing a proof that . The the final obligation is proved by constructor leS.

Typing and Elaborating μ and μ’

As with other languages with inductive datatypes, the type inference rules for pattern matching and recursion are complex. We ease the reader into these by giving in Figure 9 simplified type inference rules for μ- and μ’-expressions, specialized to Nat – more general elaboration rules are given in Figure 20.

In the typing rule for μ’, the type of the scrutinee must be one for which holds (as proved by ), and a predicate over the concrete type Nat. The type of each case body must be a proof that holds for the term constructed from the pattern guard, with the expected type for the suc branch stated by coercing the predecessor to type Nat; the type of the whole expression is a proof that holds of type-coerced (where again these coercions erase to ). The rule for μ is similar to μ’, with the typing context of each branch extended with , (with the hyphen indicating it can only occur in erased positions), and . Perhaps surprisingly, the elaborations of the these expressions are (after erasure) simply for μ’ and for μ, where the priming on term meta-variables indicates (the erasures of) the elaborations of their un-primed counterparts.

4. Zero-Cost Program Reuse for Datatypes

This section demonstrates how another desirable feature derived of λ-encoded data in CDLE – zero-cost reuse – is preserved by our datatype system through extension of definitional equality for constructors, and also how this extension can be used in a novel example simulating a simultaneous definition of a datatype and function. Often when working with dependent types, programmers find they must write several versions of a particular datatype depending on the invariants they need the type system to enforce. A standard example is the non-indexed List type and the length-indexed Vec type – if the programmer has written several functions over the former, then discovers they require the latter, their choices are usually either to re-implement existing List functionality for Vec, or write conversion functions between the two types to reuse the existing functions.

For this second option, such conversion functions simultaneously tear down one structure while rebuilding the other, taking linear time. (Diehl et al., 2018) showed how some conversions can be done in constant time and generically for programs, proofs, and data. Their development leverages the fact that in CDLE λ-encoded constructors such as nil (cons) and vnil (vcons) for List and Vec are definitionally equal. In our datatype system for Cedille, the elaborated λ-encodings of different data constructors may sometimes also be equal. We internalize this fact to Cedille, meaning the declared constructors nil (cons) and vnil (vcons) are themselves definitionally equal. This is shown in the following example with manual zero-cost reuse of map for List in vmap for Vec.

Manual zero-cost reuse of map for vmap

Figure 10 gives the definitions of the linear-time conversion functions v2l and l2v, as well as the types for list operations len and map (List is given in Figure 2, <..> and _ indicate resp. an omitted def. and anonymous proof). First, and as promised, Cedille considers the corresponding constructors of List and Vec definitionally equal:

_ : {nil  ≃ vnil}  = β.
_ : {cons ≃ vcons} = β.
This means that the linear-time functions v2l and l2v merely return a term equal to their argument at a different type. Indeed, this is provable in Cedille by easy inductive proofs vl2-id and l2v-id (Figure 11), rewriting the expected branch type by ρ (Figure 0(b)) in the cons and vcons cases using the inductive hypothesis and making implicit use of constructor equality. Thanks to φ (casting a term to the type of another it is proven equal to, Figure 0(b)), these proofs give rise to coercions v2l! and l2v! between List and Vec that erase to identity functions – meaning there is no performance penalty for using them! With v2l! and l2v! and the two lemmas mapLen and v2l!Len resp. stating that map and v2l! preserve the length of their inputs, we can now define vmap (Figure 12) over Vec by reusing map for List with no run-time cost, demonstrating that Cedille’s datatype system does not prevent use of this desirable property derived in its core theory CDLE.

Definitional Equality of Constructors

Under what conditions should users expect Cedille to equate constructors of different datatypes? Certainly they should not be required to know the details of elaboration to use features like zero-cost reuse that depend on this. Fortunately, there is a simple, high-level explanation for when different constructors are considered equal that makes reference only to the shape of the datatype declaration. We give this here informally, with the formal statement and soundness property in Section 6 (Figure 17). If , are resp. constructors of datatype and , then and are equal iff:

• and have the same number of constructors;

• the index of in the list of constructors for is the same as the index of in the list of constructors for ; and

• and take the same number of unerased arguments

That these three conditions hold for the corresponding constructors of List and Vec is readily verified: both datatypes have two constructors; nil (cons) and vnil (vcons) are each the first (second) entries in their datatype’s constructor list; and nil and vnil take no arguments while cons and vcons take two unerased argument (the Nat argument to vcons is erased). It is clear also these conditions prohibit two different constructors of the same datatype from ever being equated, as their index in the constructor list would necessarily be different.

This scheme for equating data constructors perhaps leads to some counter-intuitive results. First, changing the order of the constructors of List prevents zero-cost reuse between it and Vec. Second, between two datatypes with the same number of constructors, some constructors may be equal and others not. For example, List and Nat have two constructors, and the first of both takes no arguments. Thus, equality between zero and nil holds definitionally, but is not possible for suc and cons. The very same phenomenon occurs for e.g. Church-encoded numbers and lists.

Simulation of a Simultaneous Definition

So far, this paper has largely considered either standard examples for dependent types or features already derived in CDLE which we show are preserved by our datatype system, with the main exception being CoV pattern matching and the code reuse enabled by it. We therefore conclude this section with a novel example showing that even without built-in support for a feature like induction-recursion (IR) definitions (Dybjer, 2000), with our extended definitional equality for constructors it is nonetheless possible to simulate a simultaneous definition in Cedille.

Figure 13 shows an example of this simulation for the type of binary numbers with canonical zero elements. Bin’ is a first approximation of this type lacking such canonicity, defined by constructors:

• bZ’, representing zero

• b2’, where represents twice

• b2p’, where represents one plus twice

By the above description, it is clear both and bZ’ represent zero. To prevent this, one wishes to define simultaneously with Bin’ the Boolean predicate isBZ returning true only when its argument is bZ’, and make b2’ take a proof its argument is returns ff for isBZ. This is simulated in Cedille by declaring a second version of binary numbers Bin that refers to isBZ and admits a zero-cost coercion to Bin’.

Constructor b2 of Bin takes as an argument a term whose type is the intersection of Bin’ and Bin, and an erased proof that is not bZ’. We know from the preceding discussion that bZ is equal to bZ’, and from Figure 0(c) that is equal (modulo erasure) to and . Thus, neqBZ is equivalent to proof that is not bZ. Furthermore, constructor b2 is convertible with b2’ (in particular taking the same number of unerased arguments) and we may define the zero-cost coercion b2b’! using the same approach as in Figure 11.

The desired canonicity property canonZ states that for any and , if is equal to applications of b2’ to bZ’, must be zero. The assumed equality is inhabited when is zero and is bZ (as again bZ is equal to bZ’); otherwise, it can be used in combination with constructor argument neqBZ of b2 to derive a contradictory equation eliminated by δ (Figure 0(b)).

A careful reader may wonder whether the type Bin we defined is uninhabited by any terms constructed using b2. If so, it would trivially satisfy the desired canonicity property canonZ – and be virtually of no use! Fortunately, this is not the case:

bin1 : Bin = b2p bZ.
bin2: Bin = b2 [b2b’! bin1 , bin1] -β.
bin4: Bin = b2 [b2b’! bin2 , bin2] -β.

5. Datatype Elaboration Interface

Background: Mendler-style algebras

The generic framework of (Firsov et al., 2018a, b) models inductive datatypes using Mendler-style algebras, so we begin our discussion of the elaborator interface with a brief description of these. It is well understood categorically that an inductive datatype can be represented as the carrier of the initial algebra for (i.e., the least fixed-point of) its signature functor (Malcolm, 1990), with the definition of a conventional -algebra in type theory as the family of functions . The Mendler-style -algebra, which can also be used to define (Uustalu and Vene, 1999), is , where operationally the function is used to make recursive calls on subdata of the quantified type . A Mender-style CoV algebra is additionally equipped with an abstract destructor (i.e., fixpoint unrolling function) allowing for further case analysis on subdata at the abstract type. For a thorough treatment of the expressive power of Mendler-style algebras see (Ahn and Sheard, 2011).

5.1. Generic Framework

The elaborator interface is implemented using the definitions in Figures 13(a) and 13(b), which list resp. a set of utilities and the primary results provided by the generic framework. Our implementation regenerates these definitions for each datatype to support large indices (type-indexed types), not handled by the generic framework. All such definitions are definable in Cedille without datatypes and thus map straightforwardly to CDLE. For the utilities, we have:

• Id, intrId, elimId: Id is a generalized type of identity functions in CDLE using the fact that a term may have multiple types. A term of type is a dependent pair (, also definable) whose first projection has type and second is a proof that . Its eliminator elimId is convertible with .

• IdMapping, imap: IdMapping is a generalization of a functor where the mapping need only be defined for identity functions. A term of type can be viewed as a proof that is positive. Our elaborator produces such a proof when checking datatype positivity. imap maps such an identity function over and is convertible with .

For the generic framework, taking module parameters and (with curly braces indicating is an erased parameter):

• Fix, D, in, out: resp. the type-level fixpoint function, the least fixed-point of , and its rolling and unrolling functions.

• PrfAlg: an inductive version of the Mendler-style CoV algebra. Its additional (erased) arguments are , an identity function from to D, and , a proof that the abstract destructor is equal to out. Argument is required to be even able state the result of PrfAlg: that holds of the in of coerced (using imap) to type .

• induction, inductionComp: the generic induction principle for D and its computation law

• lambek1 and lambek2, the proofs of Lambek’s lemma that in and out are mutual inverses.

5.2. Elaborator Interface

There is a discrepancy between the facilities of the generic framework and the design of the surface language. In the former, a function like minus’ (Figure 7) must be given , the abstract out of type (where is the type variable bound in the PrfAlg given to induction, and Nat is the inductive signature functor of Nat), directly. Doing the same in the surface language has the undesirable consequence of exposing Nat to the user. Worse still, is not definitionally equal to , and so proofs like leDiv (Figure 8) would require explicit use of , as they do when using the framework directly.

Our solution to this discrepancy is View (Figure 15), a novel type that (similar to Id) takes advantage of our Curry-style theory. is the type of proofs that a specified term (of type ) can be “viewed” as having type (e.g. out at an abstract type). It is introduced (intrView) by providing some of type and a proof . Most significant of View is that its eliminator (elimView) takes the named and an erased View witness and returns the at type , thanks to the typing and erasure of φ (Figure 0(b)). The upshot is that functions like minus’ do not need to use out indirectly by taking as an argument some abstract version of it. Instead, they only require permission to use (in the form of a View witness) out at the abstract type; this why minus and minus’ (and even the fully separate definitions of pred and pred’) are definitionally equal.

Figure 15 lists the remaining definitions in the interface used by our datatype elaborator:

• IsD, isD, toD: the generic versions of the global definitions of similar name that are defined for every declared datatype. is pair type (): the first component (of type ) is a proof that all terms of type have type D; the second component (of type ) is a witness that out can be used at type . isD is (still) the trivial witness of IsD. toD casts a term of type to D given a proof ; since elimId converts with , so too does toD, justifying the convertibility of functions to/ with in the surface language. toFD is not exported to the surface language (as the datatype’s signature functor is not) and uses imap to cast to .

• ByCases, mu’: type is the generic type of proofs of by case analysis. Thus, type of mu’ says that for any term of type where holds, to show holds of (after casting to D), it suffices to give a proof by case analysis on ; its definition uses out at the abstract type (via elimView) on argument , gives this to case proving (with coercions omitted) , and rewrites this type with ρ by Lambek’s lemma.

• is the type of generic proofs that holds by induction. It is defined in terms of a proof ByCases additionally equipped with the inductive hypothesis and evidence of for the quantified type . Thus, the type of mu says that holds for any given a proof by induction on D; its definition uses induction, repackaging the assumptions available to the PrfAlg argument for use by argument ind.

There is a direct mapping between the definition of ByInd (and thus, in the type of mu) and the local definitions introduced within the body of μ-expressions in the surface language. For a μ-expression recursively defining , the type Type/ corresponds to type variable ; isType/ to argument ; and the name itself corresponds to the inductive hypothesis.

6. Elaboration of Datatype Declarations

Notation

In this section we give a formal description of the elaboration of non-indexed datatypes in Cedille to λ-encodings in CDLE, which is also the scope of the accompanying proof appendix. A datatype declaration of of kind is written , where

• is a fresh type-variable of kind whose scope is the declared constructors of

• is the context of constructors associated with their type signatures such that all recursive occurrences of in the types of arguments have been replaced by

For example, the surface-level declaration of type Nat in Figure 2 translates to

 Ind[Nat,R,zero:% Natsuc:R ➔ Nat]

We write to indicate that, for every ranging from to the number of constructors in (written ), the th constructor in has a type of the form (indicating the mixed-erasure quantification over the dependent telescope of argument terms and types ) that is well-kinded under the context extended by variables and , and furthermore that does not occur in . Notation and

indicates resp. a term-level abstraction and application over this telescope that respects the erasures and classifiers over which the variables were quantified. This convention is generalized to the sequence of term and type expressions

, as in , when indicated that these are produced from type and kind coercions of . By convention, judgments with the hooked arrow as in are elaboration rules, written when we need only that is well-typed. Judgments without hooked arrows like and indicate typing and kinding in Cedille without datatypes. Some inference rules have premises in the form , accompanied by a premise ; the first indicates a family of derivations of the parenthesized judgment indexed by the th constructor of and its constructor argument telescope , and the second is merely to name these telescopes explicitly and exhaustively. indicates a typing context consisting of the definitions in Figures 14 and 15. Teletype font indicates code literals, indicates meta-variables (except in literals denoting generated names like Is/), and denotes labels for meta-variables.

6.1. Datatype and Constructor Elaboration

Figure 16 shows elaboration of a datatype and its constructors. To improve readability we give a set of judgments each formed from a single rule performing one task: [F] and [cF] elaborate resp. the usual impredicative encoding of a datatype signature functor and its constructors; [FI] and [cFI] the inductive functor and its constructors; [FIX] and [cFIX] the least fixpoint of the inductive functor and its constructors; and [Data] of the form adds the datatype, constructors, globals (), and elaborations () to the context.

In rule [F], the first premise serves to name the family of constructor argument telescopes , and the second premise elaborates the family of types , where is fresh wrt . The body of the elaborated signature functor is a function quantifying over and abstract constructors for (themselves functions quantifying over the appropriate elaborated constructor argument types and returning ), and returns . In rule [cF] we elaborate the th constructor for this signature functor, abstracting over the recursive-occurrence type , the th sequence of arguments , abstract constructors , to produce

Concretely, here are the elaborations for Nat by these two rules:

Nat$${}^{\text{F}}$$: ★ ➔ ★ = λ R: ★. ∀ X: ★. Π z: X. Π s: R ➔ X. X.
zero$${}^{\text{F}}$$: ∀ R: ★. Nat$${}^{\text{F}}$$ · R = Λ R. Λ X. λ z. λ s. z.
suc$${}^{\text{F}}$$: ∀ R: ★. R ➔ Nat$${}^{\text{F}}$$ · R = Λ R. λ n. Λ X. λ z. λ s. s n.

The next two rules, [FI] and [cFI], show elaboration to the inductive signature functor. The type scheme elaborated by [FI] returns from a type argument the dependent intersection of (where is produced by rule [F]) and a proof that, for any property , holds if holds for each of the constructors of applied to their arguments ( in the rule). [cFI] elaborates the th constructor of the inductive functor , whose first component is the th constructor of applied to its arguments and whose second component is a proof (by using the appropriate assumption ) that holds. The two components are indeed convertible (modulo erasure), satisfying the requirements for introducing a dependent intersection.

Concretely, here are the elaborations for Nat by these two rules:

Nat$${}^{\text{FI}}$$: ★ ➔ ★ = λ R: ★. ι x: Nat$${}^{\text{F}}$$ ·R. ∀ X: Nat$${}^{\text{F}}$$ ·R ➔ ★. Π z: X zero$${}^{\text{F}}$$. Π s: (Π r: R. X (suc$${}^{\text{F}}$$ r)). X x.
zero$${}^{\text{FI}}$$: ∀ R: ★. Nat$${}^{\text{FI}}$$ ·R = Λ R. [zero$${}^{\text{F}}$$ ·R, Λ X. λ z. λ s. z].
suc$${}^{\text{FI}}$$: ∀ R: ★. R ➔ Nat$${}^{\text{FI}}$$ ·R = Λ R. λ n. [suc$${}^{\text{F}}$$ n , Λ X. λ z. λ s. s n].
Rules [FIX] and [cFIX] tie the recursive knot using the generic interface of Figure 15: datatype elaborates to , where is produced by [FI] and is a term of type IdMapping · (i.e., a proof that is covariant) whose elaboration is described by Figure 18. The constructors of are elaborated as the in of the constructors of , applied to their arguments and instantiated to type . Finally, rule [Data] serves to associate the datatype declaration with its elaboration in the typing context, with binding the globals Is/, is/, and to/ and associating datatype , its constructors, and its globals with their elaborations.

Concretely, here are the elaborations for Nat by these three rules: 555The elaborated proof of positivity of Nat is omitted for space.

Finally, given these elaboration rules we can formally state in Figure 17 the extension of definitional equality for datatype constructors (described in Section 4) and functions :

Soundness Properties

The elaborations of datatype declarations and the extension of definitional equality for constructors enjoy the following soundness properties:

Assuming

• , ,