Counting Environments and Closures

02/02/2018 ∙ by Maciej Bendkowski, et al. ∙ 0

Environments and closures are two of the main ingredients of evaluation in lambda-calculus. A closure is a pair consisting of a lambda-term and an environment, whereas an environment is a list of lambda-terms assigned to free variables. In this paper we investigate some dynamic aspects of evaluation in lambda-calculus considering the quantitative, combinatorial properties of environments and closures. Focusing on two classes of environments and closures, namely the so-called plain and closed ones, we consider the problem of their asymptotic counting and effective random generation. We provide an asymptotic approximation of the number of both plain environments and closures of size n. Using the associated generating functions, we construct effective samplers for both classes of combinatorial structures. Finally, we discuss the related problem of asymptotic counting and random generation of closed environemnts and closures.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Though, traditionally, computational complexity is investigated in the context of Turing machines since their initial development, evaluation complexity in various term rewriting systems, such as

/̄calculus or combinatory logic, attracts increasing attention only quite recently. For instance, let us mention the worst-case analysis of evaluation, based on the invariance of unitary cost models [23, 3, 1] or transformation techniques proving termination of term rewriting systems [2].

Much like in classic computational complexity, the corresponding average-case analysis of evaluation in term rewriting systems follows a different, more combinatorial and quantitative approach, compared to its worst-case variant. In [10, 11] Choppy, Kaplan and Soria propose an average-case complexity analysis of normalisation in a general class of term rewriting systems using generating functions, in particular techniques from analytic combinatorics [19]. Following a somewhat similar path, Bendkwoski, Grygiel and Zaionc investigated later the asymptotic properties of normal-order reduction in combinatory logic, in particular the normalisation cost of large random combinators [7, 4]. Alas, normalisation in /̄calculus has not yet been studied in such a combinatorial context. Nonetheless, static, quantitative properties of /̄terms, form an active stream of recent research. Let us mention, non-exhaustively, investigations into the asymptotic properties of large random /̄terms [15, 6]

or their effective counting and random generation ensuring a uniform distribution among terms with equal size 

[8, 22, 21, 9].

In the current paper, we take a step towards the average-case analysis of reduction complexity in /̄calculus. Specifically, we offer a quantitative analysis of environments and closures — two types of structures frequently present at the core of abstract machines modelling /̄term evaluation, such as for instance the Krivine or U- machine [13, 25]. In Section 2 we discuss the combinatorial representation of environments and closures, in particular the associated de Bruijn notation. In Section 3 we list the analytic combinatorics tools required for our analysis. Next, in Section 4 and Section 5 we conduct our quantitative investigation into so-called plain and closed environments and closures, respectively, subsequently concluding the paper in Section 6.

2. Environments and closures

In this section we outline the de Bruijn notation and related concepts deriving from /̄calculus variants with explicit substitutions used in the subsequent sections.

2.1. De Bruijn notation

Though the classic variable notation for /̄terms is elegant and concise, it poses considerable implementation issues, especially in the context of substitution resolution and potential name clashes. In order to accommodate these problems, de Bruijn proposed an alternative name-free notation for /̄terms [16]. In this notation, each variable is replaced by an appropriate non-negative integer (so-called index) intended to encode the distance between and its binding abstraction. Specifically, if is bound to the st abstraction on its unique path to the term root in the associated /̄tree, then is replaced by the index . In this manner, each closed /̄term in the classic variable notation is representable in the de Bruijn notation.

Example 1.

Consider the /̄term Figure 1 depicts three different representations of as tree-like structures. The first one uses explicit variables, the second one uses back pointers to represent the bound variables, whereas the third one uses de Bruijn indices.

Figure 1. Three representations of the /̄term .

In order to represent free occurrences of variables, one uses indices of values exceeding the number of abstractions crossed on respective paths to the term root. For instance, can be represented as since and correspond to two different variable occurrences.

Recall that in the classic variable notation a /̄term is said to be closed if each of its variables is bound. In the de Bruijn notation, it means that for each index occurrence in one finds at least abstractions on the unique path from to the term root of . If a /̄term is not closed, it is said to be open. If heading with abstractions turns it into a closed /̄term, then is said to be -open. In particular, closed /̄terms are -open.

Example 2.

Note that is closed. The -term is -open, however it is not -open. Indeed, is -open instead of being closed. Similarly, is -open, however it is not -open.

Certainly, the set of -open terms is a subset of the set of -open terms. In other words, if is -open, it is also -open. The set of all /̄terms is called the set of plain terms. It is the union of the sets of -open terms and is denoted as . Hence,


Let us note that de Bruijn’s name-free representation of /̄terms exhibits an important combinatorial benefit. Specifically, each /̄term in the de Bruijn notation represents an entire -equivalence class of /̄terms in the classical variable notation. Indeed, two variable occurrences bound by the same abstraction are assigned the same de Bruijn index. In consequence, counting /̄terms in the de Bruijn notation we are, in fact, counting entire -equivalence classes instead of their inhabitants.

2.2. Closures and -reduction

Recall that the main rewriting rule of -calculus is -reduction, see, e.g. [14]


where the operation , i.e. substitution of /̄terms for de Bruijn indices, is defined inductively as follows:


The operation is defined, again, inductively as


A /̄term in the form of is called a -redex (or simply a redex). Lambda terms not containing -redexes as subterms, are called (-)normal forms. The computational process of rewriting (reducing) a /̄term to its /̄normal form by successive elimination of /̄redexes is called normalisation. There exists an abundant literature on normalisation in /̄calculus; let us mention, not exhaustively [24, 30, 26, 13, 27].

One of the central concepts present in various formalisms dealing with normalisation in /̄calculus are environments and closures. An environment is a list of values meant to be assigned to indices of an -open /̄term. A closure, on the other hand, is a couple consisting of a /̄term and an environment. Such couples are meant to represent closed, not yet fully evaluated, /̄terms. For instance, the closure consists of the /̄term evaluated in the context of an empty environment, denoted as , and represents simply . The closure represents the /̄term evaluated in the context of an environment . Here, intuitively, the index is receiving the value whereas the index is being assigned . Finally, is applied to . And so, reducing the closure , for instance using a Krivine abstract machine [13], we obtain .

Let us notice that following the outlined description of environments and closures, we can provide a formal combinatorial specification for both using the following mutually recursive definitions:


In the above specification, denotes the set of all plain /̄terms. Moreover, we introduce two binary operators “”, i.e. the coupling operator, and “”, i.e. the cons operator, heading its left-hand side on the right-hand list. When applied to a /̄term and an environment, the coupling operator constructs a new closure. In other words, a closure is a couple of a /̄term and an environment whereas an environment is a list of closures, representing a list of assignments to free occurrences of de Bruijn indices.

Such a combinatorial specification for closures and environments plays an important rôle as it allows us to investigate, using methods of analytic combinatorics, the quantitative properties of both closures and environments.

3. Analytic tools

In the following section we briefly outline the main techniques and notions from the theory of generating functions and singularity analysis. We refer the curious reader to [19, 32] for a thorough introduction.

Let be a sequence of non-negative integers. Then, the generating function associated with is the formal power series . Following standard notational conventions, we use to denote the coefficient standing by in the power series expansion of . Given two sequences and we write to denote the fact that both sequences admit the same asymptotic growth order, specifically . Finally, we write if we are interested in the numerical approximation of an expression .

Suppose that , viewed as a function of a single complex variable , is defined in some region of the complex plane centred at . Then, if admits a convergent power series expansion in form of


it is said to be analytic at point . Moreover, if is analytic at each point , then is said to be analytic in the region . Suppose that there exists a function analytic in a region such that and both and agree on , i.e. . Then, is said to be an analytic continuation of onto . If defined in some region has no analytic continuation onto , then is said to be a singularity of . When a formal power series represents an analytic function in some neighbourhood of the complex plane origin, it becomes possible to link the location and type of singularities corresponding to , in particular so-called dominating singularities residing at the respective circle of convergence, with the asymptotic growth rate of its coefficients. This process of singularity analysis developed by Flajolet and Odlyzko [18] provides a general and systematic technique for establishing the quantitative aspects of a broad class of combinatorial structures.

While investigating environments and closures, a particular example of algebraic combinatorial structures, the respective generating functions turn out to be algebraic themselves. The following prominent tools provide the essential foundation underlying the process of algebraic singularity analysis based on Newton-Puiseux expansions, i.e. extensions of power series allowing fractional exponents.

Theorem 1 (Newton, Puiseux [19, Theorem VII.7]).

Let be a branch of an algebraic function . Then, in a circular neighbourhood of a singularity slit along a ray emanating from , admits a fractional Newton-Puiseux series expansion that is locally convergent and of the form


where and .

Let be analytic at the origin. Note that . In consequence, following a proper rescaling we can focus on the type of singularities of on the unit circle. The standard function scale provides then the asymptotic expansion of .

Theorem 2 (Standard function scale [19, Theorem VI.1]).

Let . Then, admits for large a complete asymptotic expansion in form of


where is the Euler Gamma function defined as


Given an analytic generating function implicitly defined as a branch of an algebraic function our task of establishing the asymptotic expansion of the corresponding sequence reduces therefore to locating and studying the (dominating) singularities of . For generating functions analytic at the complex plane origin, this quest simplifies even further due to the following classic result.

Theorem 3 (Pringsheim [19, Theorem IV.6]).

If is representable at the origin by a series expansion that has non-negative coefficients and radius of convergence , then the point is a singularity of .

We can therefore focus on the real line while searching for respective singularities. Since cannot be unambiguously defined as an analytic function at we primarily focus on roots of radicand expressions in the closed-form formulae of investigated generating functions.

3.1. Counting -terms

Let us outline the main quantitative results concerning /̄terms in the de Bruijn notation, see [6, 21]. In this combinatorial model, indices are represented in a unary encoding using the successor operator and . In the so-called natural size notion [6], assumed throughout the current paper, the size of /̄terms is defined recursively as follows:

And so, for example, . We briefly remark that different size notions in the de Bruijn representation, alternative to the assumed natural one, are considered in the literature. Though all share similar asymptotic properties, we choose to consider the above size notion in order to minimise the technical overhead of the overall presentation. We refer the curious reader to [21, 9] for a detailed analysis of various size notions in the de Bruijn representation.

Let denote the number of plain /̄terms of size . Consider the generating function . Using symbolic methods, see [19, Part A. Symbolic Methods] we note that satisfies


In words, a /̄term is either (a) an abstraction followed by another /̄term, accounting for the first summand, (b) an application of two /̄terms, accounting for the second summand, or finally, (c) a de Bruijn index which is, in turn, a sequence of successors applied to . Solving (10) for we find that the generating function , taking into account that the coefficients are positive for all , admits the following closed-form solution:


In such a form, is amenable to the standard techniques of singularity analysis. In consequence we have the following general asymptotic approximation of .

Theorem 4 (Bendkowski, Grygiel, Lescanne, Zaionc [6]).

The sequence corresponding to plain /̄terms of size admits the following asymptotic approximation:




In the context of evaluation, the arguably most interesting subclass of /̄terms are closed or, more generally, -open /̄terms. Recall that an -open /̄term takes one of the following forms. Either it is (a) an abstraction followed by an -open /̄term, or (b) an application of two -open /̄terms, or finally, (c) one of the indices . Such a specification for -open /̄terms yields the following functional equation defining the associated generating function :


Since depends on , solving (14) for one finds that


Such a presentation of poses considerable difficulties as depends on depending itself on , etc. If developed, the formula (15) for consists of an infinite number of nested radicals. In consequence, standard analytic combinatorics tools do not provide the asymptotic expansion of , in particular associated with closed /̄terms. In their recent breakthrough paper, Bodini, Gittenberger and Gołȩbiewski [9] propose a clever approximation of the infinite system associated with and give the following asymptotic approximation for the number of -open /̄terms.

Theorem 5 (Bodini, Gittenberger and Gołȩbiewski [9]).

The sequence corresponding to -open /̄terms of size admits the following asymptotic approximation:


where is the dominant singularity corresponding to plain -terms, see (13), and is a constant, depending solely on .

Let us remark that for closed /̄terms, the constant lies in between and . In what follows, we use the above Theorem 5 in our investigations regarding what we call closed closures.

4. Counting plain closures and environments

In this section we start with counting plain environments and closures, i.e. members of and , see (5). We consider a simple model in which the size of environments and closures is equal to the total number of abstractions, applications and the sum of all the de Bruijn index sizes. Formally, we set

Example 3.

The following two tables list the first few plain environments and closures.

By analogy with the notation for the set of plain /̄terms, we write and to denote the class of plain environments and closures, respectively. Reformulating (5) we can now give a formal specification for both and as follows:


In such a form, both classes and become amenable to the process of singularity analysis. In consequence, we obtain the following asymptotic approximation for the number of plain environments and closures.

Theorem 6.

The numbers and of plain environments and closures of size , respectively, admit the following asymptotic approximations:






Consider generating functions and associated with respective counting sequences, i.e. the sequence of plain environments of size and of plain closures of size . Based on the specification (17) for and and the assumed size notion, we can write down the following system of functional equations satisfied by and :


Next, we solve (21) for and . Though (21) has two formal solutions, the following one is the single one yielding analytic generating functions with non-negative coefficients:


Since for there are two potential sources of singularities in (22). Specifically, the dominating singularity of , see (13), or roots of the radicand expression . Therefore, we have to determine whether we fall into the so-called sub- or super-critical composition schema, see [19, Chapter VI. 9]. Solving for , we find that it admits a single solution equal to


Since the outer radicand carries the dominant singularity of both and . We fall therefore directly into the super-critical composition schema and in consequence know that near both and admit Newton-Puiseux expansions in form of


with and . At this point, we can apply the standard function scale, see Theorem 2, to the presentation of and in and conclude that


where and , respectively, with . In fact, reformulating (22) so to fit the Newton-Puiseux expansion forms (24) we find that




Numerical approximations of and yield the declared asymptotic behaviour of and , see (18). ∎

Let us notice that as both generating functions and are algebraic, they are also holonomic (D-finite), i.e. satisfy differential equations with polynomial (in terms of ) coefficients. Using the powerful gfun library for Maple [31] one can automatically derive appropriate holonomic equations for and , subsequently converting them into linear recurrences for sequences and .

Example 4.

We restrict the presentation to the linear recurrence for the number of plain environments, omitting for brevity the, likely verbose, respective recurrence for plain closures. Using gfun we find that satisfies the recurrence of Figure 2. Despite its appearance, this recurrence is an efficient way of computing . Indeed, holonomic specifications for and allow computing the coefficients and using a linear number of arithmetic operations, as opposed to a quadratic number of operations as following their direct combinatorial specification. Let us remark that the computations involved operate on large, having linear in space representation, integers. For instance, has about digits. In consequence, single arithmetic operations on such numbers cannot be performed in constant time.

Figure 2. Linear recurrence defining with corresponding initial conditions.
Remark 1.

The analytic approach utilising generating functions exhibits an important benefit in the context of generating random instances of plain environments and closures. With analytic generating functions at hand and effective means of computing both and , it is possible to design efficient samplers, constructing uniformly random (conditioned on the outcome size ) structures of both combinatorial classes. For instance, using holonomic specifications it becomes possible to construct exact-size samplers following the so-called recursive method of Nijenhuis and Wilf, see [28, 20]. Moreover, since corresponding generating functions are analytic, it is possible to design effective Boltzmann samplers [17], either in their approximate-size variant constructing structures within a structure size interval in time where is the outcome structure, or their exact-size variants running in time . Remarkably, both sampler frameworks admit effective tuning procedures influencing the expected internal shape of constructed objects, e.g. frequencies of desired sub-patterns [5]. With the growing popularity of (semi-)automated software testing techniques, see [12], combinatorial samplers for environments and closures exhibit potential applications in testing normalisation frameworks and abstract machines implementations, such as the Krivine machine. We briefly remark that randomly generated /̄terms already proved useful in finding optimisation bugs in compilers of functional programming languages, see [29]. Our prototype samplers for environments and closures, within above sampler frameworks, are available at Github111

5. Counting closed closures

In this section we address the problem of counting so-called closed closures222

We acknowledge that speaking of closed closures is a bit odd, however terms “closure” and “closed” form a consecrated terminology that we merely associate together.

. A closure is said to be closed if it consists of an -open term associated with an environment of length made itself out of closed closures. Note that such closures correspond to not yet fully evaluated -open /̄terms. With such a description, the set of closed closures can be given using the following combinatorial specification:

Example 5.

The following table lists the first few closed closures.

Establishing the asymptotic growth rate of the sequence corresponding to closed closures of size poses a considerable challenge, much more involved than its plain counterpart. In the following theorem we show that there exists two constants such that and . In other words, the asymptotic growth rate of is bounded by two exponential functions of .

Theorem 7.

There exist satisfying and functions satisfying such that for sufficiently large we have .


Let us start with the generating function associated with closed closures . Note that from the specification (28) is implicitly defined as


We can therefore identify a closed closure with a tuple where , is an -open -term and are closed closures themselves. We proceed with defining two auxiliary lower and upper bound classes and such that for all . Next, we establish their asymptotic behaviour and, in doing so, provide exponential lower and upper bounds on the growth rate of closed closures.

We start with . Note that is associated with closures in which each term is closed, independently of the corresponding environment length. Hence, as closed /̄terms are -open for all , we have . Furthermore


Solving the above equation for we find that . In such a form, it is clear that there are two potential sources of singularities, i.e. the singularity of , see Theorem 5, or the roots of the radicand . Since is increasing and continuous in the interval we know that if then there exists a such that . Unfortunately, we cannot simply check that as there exists no known method of evaluating , defined by means of an infinite system of equations, at a given point. For that reason we propose the following approach.

Recall that a /̄term is said to be -shallow if all its de Bruijn index values are (strictly) bounded by , see [21]. Let denote the generating function associated with -open -shallow -terms. Note that , i.e. the generating function corresponding to closed -shallow -terms, has a finite computable representation. Indeed, we have


Consider . Each -open -shallow -term is either (a) in form of where is an -open -shallow -term due to the head abstraction, (b) in form of where both and are -open -shallow -terms, or (c) a de Bruijn index in the set . When , we have the same specification with the exception of the first summand where, as we cannot exceed , terms under abstractions are -open, instead of -open.

Using such a form it is possible to evaluate at each point where is the dominating singularity of satisfying , see [21]. Certainly, each closed -shallow