 # How to generate random lambda terms?

We survey several methods of generating large random lambda-terms, focusing on their closed and simply-typed variants. We discuss methods of exact- and approximate-size generation, as well as methods of achieving size-uniform and non-uniform outcome distributions.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Generating examples of various combinatorial structures is an essential part of devising counterexamples to working hypotheses as well as building up confidence in their genuineness. It is a standard mathematical practice used to sieve out and test promising ideas or conjectures before more rigorous and labour-intensive treatment is called for. The more examples pass our tests, the more evidence supports our hypothesis. To illustrate this point, consider the famous Collatz conjecture.

###### Conjecture (Collatz, 1937).

Let be a function defined as

 (1) f(n)={n/2if n≡0(mod2)3n+1if n≡1(mod2)

Then, for all there exists a sufficiently large such that .

The Collatz conjecture remains one of the most prominent open problems in mathematics. It states that given any natural number , the infinite sequence

 (2) f(n), f(f(n)), f(f(f(n))),…

must eventually reach the value or, equivalently, fall into the cycle .

A myriad of computer-generated empirical evidence shows that it holds for all positive integers up to  [lagarias2011]. Although no proof of Collatz’s conjecture is known, the sheer bulk of empirical data provides high confidence in its genuineness which, in turn, inspires new attempts at solving its mystery.

Nevertheless, even the highest levels of confidence cannot replace rigorous mathematical reasoning. After all, we cannot test a hypothesis for all, infinitely many different inputs. By a twist of perspective, however, all it takes to disprove a conjecture is a single counterexample. That, of course, can be looked for! A famous example of a conjecture which was refuted by a single counterexample is Euler’s sum of powers conjecture.

###### Conjecture (Euler, 1769).

Let . If , then .

This long-standing conjecture was finally settled in the negative by Lander and Parkin [lander1966]. Using a systematic, computer-assisted search procedure they found the following neat counterexample to Euler’s claim:

 (3) 275+845+1105+1335=1445.

Without the use of computers, such a counterexample might not have been found.

### 1.1. Why do we want random lambda terms?

The generate-and-check principle of testing working hypotheses is mirrored in software development by software verification. Since proving properties about large, industrial-strength programs is tremendously difficult and time-consuming, in fact almost infeasible for most types of applications, the modern practice is to test them using a large body of carefully crafted tests cases checking the intended behaviour of the program. Absolute, mathematical assurances are often given up in favour of feasible, yet reasonable levels of confidence in the actual program’s properties.

Remarkably, test cases need not to be created manually by programmers but, can (at least to some level) be generated automatically by a computer. The perhaps most notable examples of such a general testing process in software development include model checking [cgjlv2000] and property-based software testing such as QuickCheck [Claessen:2000:QLT]. In testing environments like QuickCheck, the programmer is spared the burden of writing single test cases. Instead, she is given the task of defining properties (invariants) of tested software and providing a recipe for generating random instances of input data. The testing environment generates hundreds of random test cases and attempts to disprove programmer-specified properties. If a counterexample is found, the programmer is given constructive evidence that the indented property does not hold. Otherwise, with hundreds or thousands of successful test cases, she can become quite confident in the software’s correctness.

One prominent application of random -terms comes from testing the correctness of certain program transformations in ghc — Haskell’s most famous optimising compiler [marlow2012the]. Lambda terms are a simple (if not the simplest) example of a non-trivial combinatorial structure with variables and corresponding name scopes. Consequently, -terms provide the scaffold of simple functional programs and can be used to generate random inputs for compilers to consume. If, for some specific input program , the tested compiler generates two observably different executables (depending on the level of enabled optimisations) forms a constructive counterexample to the working assumption that performed optimisations do not change the semantics of the input program. Remarkably, such a strikingly simple testing method lead to the discovery of several important bugs in ghc’s strictness optimiser [Palka:2011:TOC:1982595.1982615] (see also [10.1145/3110259]). In particular, its misuse of the [commandchars=
{}] seq :: a -> b -> b function of the following semantics:

[commandchars=
{},codes=] ‘seq‘ b = a ‘seq‘ b = b

In the current paper we discuss several techniques of generating large, random -terms. We do not touch on the conceptually simpler exhaustive generation problem in which we are interested in the construction of all terms of a fixed size , like in frameworks such as SmallCheck [conf/haskell/RuncimanNL08]. Instead, we discuss the various methods of sampling (uniformly) random -terms of size , including the two important classes of simply-typed and closed terms.

### 1.2. What lambda terms do we want to sample?

Before we begin to discuss how to generate random

-terms, let us pause for a moment and discuss

what terms we want to sample. In order to sample (uniformly) random terms of some size , we have to ensure that there exists only a finite number of terms of that size. It means that we have to decide what to do with -equivalent terms and how to represent -terms. Let us consider these issues in order.

Let and be two distinct -equivalent -terms, differing only in names of their bound variables, for instance and . Although syntactically different, both represent the same, famous combinator. It can be therefore argued that under reasonable111Size models in which the terms size does not depend on the particular names of bound variables. size models, both should be assigned the same size. Since we have an infinite supply of variable names, we are consequently inclined to consider -equivalent terms indistinguishable. In other words, focus on sampling -equivalence classes of -terms, instead of their inhabitants.

Given these concerns, we have to decide on a representation in which we do not differentiate bound variable names. We can therefore pick one canonical representative for each -equivalence class of terms, in effect sampling terms up to -equivalence, see [Wang05generatingrandom, Wang05generatingrandom2, BODINI2013227], or use a representation in which there exists just one such representative, for instance the De Bruijn notation [deBruijn1972], see [10.5555/2438099.2438158, BODINI201845, bendkowski_grygiel_tarau_2018]. Recall that in the De Bruijn notation there are no named variables. Instead, we use dummy indices to denote variable occurrences. The index n represents a variable which is bound by its st proceeding abstraction. Should there be fewer abstractions than , then n represents a free variable occurrence. And so, in the De Bruijn notation both and admit the same representation , see Figure 1.

Neither representation is better than the other one. What makes them quite different, however, are the details of related size models, in particular what weights we assign to abstractions, applications, and variables. For instance, how should measure the size of a variable? Should it be proportional to the distance between the variable occurrence and its binding abstraction, or should it be independent of that distance? What toll should be impose on the binder distance? Should it be a linear function or perhaps a more sophisticated one, say logarithmic? If we use De Bruijn indices, how should we represent them? Should we use the more traditional unary base where n is just an -fold application of the successor function to zero, or should we represent indices in binary format?

Naturally, these representation details prompt different size models and, consequently, alter the landscape of random -terms we are going to sample. What details should we therefore agree on? Depending on the specifics of the representation and size model, large random -terms change their statistical characteristics. Knowing these traits, we can choose a specific size model (and representation) which best suits our needs. For instance, large random -terms in representations in which variables have constant weight tend to be strongly-normalising [lmcs:848]. On the other hand, in De Bruijn models in which variables have size proportional to the distance between them and their binding abstractions, large random terms are almost never strongly-normalising [10.1093/logcom/exx018].

Finding properties of large, random -terms is a fascinating and surprisingly challenging endeavour. It often involves sophisticated counting arguments using generating functions and advanced methods of analytic combinatorics [flajolet09]. For some size models, however, even these powerful techniques are not so easily applicable. In the current survey, we do not dive into theoretical details of these methods. Instead, we focus more on their application to the random generation of -terms. We invite the curious reader to follow the referenced literature for more details.

###### Remark .

We are interested in generating large, uniformly random -terms. With large, unbiased terms it is possible to test not only the correctness of a compiler or abstract machine implementing the target language, but also test its performance and scalability, otherwise impossible to test through small input programs.

Our decision has a few notable consequences. We are not interested in type-oriented generation methods similar to the Ben-Yelles algorithm [benyelles] or ad-hoc techniques which do not give the user direct and rigorous control over the outcome term distribution. While we are primarily interested in uniform distributions, we also address the issue of non-uniform generation in Section 2.6.

## 2. Combinatorial generation methods

Generating random combinatorial structures is a well-established topic in theoretical computer science. Let us examine how standard generation methods can be tailored to construct uniformly random -terms.

### 2.1. Exact-size sampling and the recursive method

Nijenhuis and Wilf’s recursive method [NijenhuisWilf1978], later systematised by Flajolet, Zimmerman and Van Cutsem [FLAJOLET19941], is a simple, general-purpose recursive sampling template for a wide range of combinatorial objects, including context-free languages and various tree-like structures.

Given a formal specification of so-called decomposable structures, the recursive method provides an inductive scheme of constructing recursive samplers meant to generate objects of some fixed size . In what follows we adopt the recursive method and design a simple sampler for closed -terms in the De Bruijn notation:

 (4) N,M:=\text@underline{\sf n} | (NM) | λN.

Following the recursive method, terms will be built in a top-down fashion, according to their inductive specification (4). In fact, we are going to devise a slightly more general recursive sampler building terms of size , with free indices in the set . The desired closed -term sampler will be recovered afterwards.

During its execution, will make a series of random choices. Each of these random decisions will we consulted with an external oracle providing suitable

branching probabilities

ensuring a uniform outcome distribution. We explain later how to construct such an oracle given the formal -term specification (4).

When invoked, has to decide whether to output one of the available indices , an application of two terms , or an abstraction . The sampler queries the oracle for respective branching probabilities, and chooses a constructor according to the obtained distribution. Building an index is trivial. Assuming that we have enough fuel to pay for the index n, we just need to look it up in . Applications and abstractions are a bit more involved.

Note that if is a term with free indices in , then the free indices of its subterm are elements of where is the lifted variant of . Therefore, if decides to construct an abstraction , it can build by invoking , accounting size units for the head abstraction (depending, of course, on the assumed size model).

Let us consider what happens when decides to generate an application . Denote the size contribution of a single application as . Now, in order to build we first query the oracle for a random . Next, we construct two terms of sizes and , respectively. Note that variable contexts of both and do not change. We can therefore readily invoke and , and apply to .

Oracle construction. In order to make its random decisions repeatedly queries an external oracle . Suppose that we invoke . How should compute the necessary branching probabilities? The idea is quite simple.

Let denote the set of -terms of size with free indices in the set . In order to assign each term in a uniform probability

 (5) p=1|LΓ(n)|

we first determine . Then, the branching probabilities and corresponding to choosing and index n, an abstraction , and an application , respectively, are given by

 (6) p1=p⋅|{% \text@underline{\sf n}:\text@underline{\sf n}∈LΓ(n)}|p2=p⋅|{λN:λN∈LΓ(n)}|p3=p⋅|{(NM):(NM)∈LΓ(n)}|.

Likewise, the probability of choosing an application in which is of size and is of size is equal to the proportion of such applications among all applications of size , all in the common context . It means that in order to compute appropriate branching probabilities our oracle needs to calculate cardinalities of involved sets.

Computing the cardinality of , as well as specific subsets can be achieved using dynamic programming, much like itself. The oracle can be precomputed once, memorised, and reused in between subsequent sampler invocations.

Complexity and limitations. The recursive method, albeit simple, admits considerable practical limitations. Both the oracle construction and the following sampling procedure require arithmetic operations on integers exponential in the target size , turning the effective bit complexity of the recursive method to . It is therefore possible to sample -terms of moderate sizes in the thousands.

Let us remark that Denise and Zimmerman combined the recursive method with certified floating-point arithmetic, reducing the expected time and space complexity of sampling down to

and preprocessing time [DenZimm99]. Notably, for context-free languages the preprocessing time can be further reduced to .

###### Remark .

Grygiel and Lescanne [grygiel_lescanne_2013] proposed a somewhat similar sampler for closed -terms based on the following ranking-unranking principle.

Given a set of elements we wish to sample from, we construct a pair of mutually inverse bijections and . The former function defines the rank of an element , whereas the latter unranking function maps a given rank to the unique element attaining its value. Both and its inverse are constructed in a recursive manner. With these two functions, sampling elements of boils down to sampling a uniformly random number . Once we obtain a random rank , we can use the unranking function to construct the corresponding element .

Based on the ranking-unranking sampler for closed -terms (in the size model where variables do not contribute to the term size) Grygiel and Lescanne derived a rejection sampler for simply-typed -terms. Such a sampler generates closed (untyped) terms until a typeable one is sampled. Intermediate terms are simply discarded.

The efficiency of such a rejection scheme depends on two factors — the efficiency of the underlying sampler for untyped terms and, even more importantly, the proportion of typeable terms among closed (untyped) terms. Unfortunately, already for the ratio between typeable terms and untyped ones is less than , see [grygiel_lescanne_2013, Section 9.3]. It means that the average number of trials required to find a typeable term of size is already of order . Such a sampler is not likely to output terms larger than .

Let us remark that Lescanne devised a more direct sampler for linear and affine -terms based on the same ranking-unranking principle [Lescanne:2018:QAL:3185755.3173547]. Tracking the evolution of respective (simpler) contexts, Lescanne proposed samplers for a few existing size models.

###### Remark .

Wang proposed a recursive sampler for closed -terms in a size model where all constructors (i.e. variables, applications, and abstractions) contribute one to the overall term size [Wang05generatingrandom]. She then adopted her samplers to simply-typed -normal forms [Wang05generatingrandom2] using a finite truncation of Takahashi, Akama, and Hirokawa’s context-free grammars generating -normal forms of type in the context  [TAKAHASHI1996144].

Since her sampler is based on a dedicated grammar for -normal forms, it cleverly avoids the slowdown imposed by rejection sampling from the larger set of untyped terms. Unfortunately, such a method comes at a significant price. Grammar-based samplers do not scale onto larger types. Involved grammars quickly become intractable and too expensive to be effectively constructed.

Let us remark that using the idea of truncated type grammars Asada, Kobayashi, Sin’ya and Tsukada showed recently that, asymptotically almost all typeable -terms of order have a -fold exponential reduction sequence [lmcs:5203].

### 2.2. Generating functions and the analytic toolbox

Counting and generating random objects are intimately connected activities. In order to construct random -terms we usually need to compute a series of branching probabilities. These, in turn, depend on the specific cardinalities of many different term families for various size values.

In order to conveniently operate on (infinite) counting sequences enumerating respective cardinalities, we resort to generating functions, i.e. formal power series whose coefficients encode the counting sequence we are interested in. To illustrate the convenience of generating functions, consider the famous example of Catalan numbers enumerating, inter alia, the number of expressions containing pairs of matched parentheses, or plane binary trees with internal nodes. These numbers satisfy the following recursive relation:

 (7) cn+1=n∑i=0ci⋅cn−iwherec0=1.

With the help of generating functions, its possible to represent the entire infinite counting sequence of Catalan numbers as a finite expression. Lifting (7) onto the level of generating functions involving we find that

 (8) C(z)=1+zC(z)2=1−√1−4z2z.

In this form, we can recover the th Catalan number by finding the th coefficient of using the Taylor series expansion of around .

Typically, when solving recursive equations like (7) we do not concern ourself with the convergence of associated generating functions. If, however, the generating functions happen to be convergent, they correspond to complex analytic functions. We can then draw from the rich fountain of analytic combinatorics [flajolet09] — a theory devoted to giving precise quantitative predictions of large combinatorial structures, exploiting for that purpose the analytic properties of related generating functions.

Holonomic functions and P-recursive sequences. We say that a formal power series is holonomic (D-finite) if it satisfies a linear differential equation

 (9) Pr(z)drdzrf(z)+Pr−1(z)dr−1dzr−1f(z)+⋯+P1(z)ddzf(z)+P0(z)f(z)=0

for some polynomials , cf. [flajolet09, Appendix B.4]. By extension, a function analytic at is said to be holonomic if its power series representation is holonomic. Likewise, the coefficient sequence of a holonomic power series is called holonomic (P-recursive).

Holonomic functions are a rich subclass of analytic functions emerging naturally in the context of various enumeration problems. Notably, D-finite functions enjoy a series of pleasing properties. For instance, holonomic generating functions subsume rational and algebraic functions — two classes of generating functions linked to counting problems involving rational and context-free languages. Given an algebraic function represented as a branch of a polynomial equation it is possible to compute its corresponding D-finite representation [comtet]. In fact, modern computer algebra systems provide dedicated packages specifically for that purpose, cf. [SalvyZimmermann1994].

D-finite functions have found numerous applications in combinatorial enumeration and symbolic computations [aequalsb]. Remarkably, using respective linear differential equations (9) it is possible to find linear recurrences defining their coefficient sequences and, consequently, compute the th coefficient using just arithmetic operations.

Consider our running example of Catalan numbers. The related holonomic equation for takes the form

 (10) 1+(2z−1)C(z)+(4z2−z)ddzC(z)=0.

In the special case of generating functions, the differential operator admits a natural combinatorial interpretation. Note that the defining relation

 (11) zddzC(z)=zddz∑n≥0cnzn=∑n≥1n⋅cnzn

implies that the coefficient is equal to . We can therefore interpret the generating function as encoding the counting sequence of plane binary trees with pointed internal nodes. Each tree of size has internal nodes and therefore gives rise to different variants of itself, each with a different node being pointed.

Using (11) we can translate the holonomic equation defining into a linear recurrence involving its coefficients:

 (12) (n+2)cn+1=2(2n+1)cnwithc0=1.

It is clear that with the help of (12) we can compute the th Catalan number using only a linear number of arithmetic operations. On the other hand, using the naïve method derived from the defining identity , we have to compute a sum of products of all in order to compute the next Catalan number . Such a process requires arithmetic operations.

Following similar steps, each holonomic (in particular algebraic) function admits a linear recurrence governing its coefficients. Note that by exploiting this fact we can, for instance, speed up the oracle computations involved in the recursive method.

### 2.3. Combinatorial bijections and Rémy’s algorithm

The simple recurrence (12) defining provides the basis for an elegant sampling algorithm for plane binary trees due to Rémy [Remy85]. The main idea is to interpret it combinatorially, using it as an indication on how to map the set of binary trees with internal nodes to the larger set of trees with internal nodes.

We start with noticing that each binary tree with internal nodes has leaves and so the total of nodes. It means that we can interpret (12) as a bijection between two sets of trees — on the left-hand side, the set of binary trees with internal nodes, each with a single pointed leave; and on the right-hand side, the set of binary trees with internal nodes, each with a single pointed node coloured with one of two distinct colours, say red or blue . Let be a binary tree with internal nodes. Consider the following grafting operation:

• Select a random node (be it internal or external) in and call it .

• Flip a coin. Depending on the outcome, colour either red  or blue .

• If was coloured red replace it with in .

• Symmetrically, if was coloured blue replace it with in .

• Point to the newly created leave (sibling of ).

Suppose that we started the grafting operation with a (uniformly) random tree with internal nodes. Note that once we select a random node and assign it a colour, we end up with a random tree enumerated by the right-hand side of (12). At this point we readily notice that after the above grafting operation we obtain a random tree enumerated by the left-hand side of (12). We can therefore forget the pointer created in the final grafting step and obtain a random binary tree with internal nodes.

Clearly, this process can be iterated as many times as required. If we want to sample a random binary tree of size , we can start with a single node and repeat the grafting operation times. In his celebrated Art of Computer Programming [Knuth:2006:ACP:1121689] Knuth provides an elegant implementation of Rémy’s algorithm which he calls algorithm R. With its help, we obtain an efficient, linear time, exact-size sampling algorithm for plane binary trees.

###### Remark .

Finding constructive interpretations for combinatorial identities is, in general, a non-trivial and creative task. Rémy’s algorithm is somewhat ad-hoc and therefore cannot be easily generalised onto other tree structures. Interestingly enough, let us remark that Bacher, Bodini, and Jacquot were able to use ideas of Boltzmann sampling and exploit the holonomic specification for so-called Motzkin trees (i.e. unary-binary trees), developing a linear time sampler [BBJ].

###### Remark .

Quite often appropriate bijections are derived only after combinatorial identities involving respective generating function are established. For instance, let us mention the non-trivial bijections between combinatorial maps and certain classes of enriched trees leading to efficient samplers for linear and affine -terms in the canonical representation of -terms constructed up to -equivalence [BODINI2013227], or the bijection between neutral -terms and Motzkin trees in the so-called natural size model under the De Bruijn notation [10.1093/logcom/exx018].

Random combinators. Rémy’s algorithm is an important sampling algorithm allowing us to generate combinators of any finite bases of primitive combinators. To illustrate this point, consider the example of -combinators specified as

 (13) C:=S | K | (CC).

Assume that the size of a combinator is equal to the number of its internal applications222If and contribute to the size, the corresponding counting sequence becomes periodic and introduces some technical (though manageable) difficulties, similarly to the case of leaves in plane binary trees.. Suppose that we want to sample a combinator of size . Note that we can decompose each combinator of size into a plane binary scaffold with internal nodes, and a sequence of primitive combinators and . The scaffold determines the application structure of the combinator whereas the sequence governs the placement of and s in the term. Moreover, such a decomposition is clearly invertible — simply traverse the scaffold in-order and assign elements of the sequence to successive leaves.

The unambiguous decomposition of combinators allows us to use Rémy’s algorithm to sample a random scaffold with internal nodes and, independently, generate a sequence of random primitive combinators. Afterwards, we compose the two into a uniformly random combinator.

###### Remark .

The above sampling scheme is quite efficient. It allows us to easily sample random combinators even of sizes in the hundreds of millions. Unfortunately, even with the Motzkin tree sampler of Bacher, Bodini, and Jacquot [BBJ] it is not so easy to provide an efficient, analogous sampler for closed -terms.

Unlike binary trees, Motzkin trees of some fixed size might have a varying number of leaves. Moreover, in order to interpret a leave as a variable , it is important to know the number of its potential binders, i.e. unary nodes between and the term root. We refer the curious reader interested in empirical evaluation of such scaffold decompositions for -terms to [10.1007/978-3-319-73305-0_8, 10.1007/978-3-319-94460-9_15].

### 2.4. Boltzmann models

For many years, the exact-size sampling paradigm was the de facto standard in combinatorial generation. Its long-lasting dominance was brought to an end with the seminal paper of Duchon, Flajolet, Louchard, and Schaeffer [Duchon:2004:BSR:1024662.1024669] who introduced the framework of Boltzmann models. Instead of generating random structures of fixed size , Boltzmann models provide a more relaxed, approximate-size sampling environment which uses the analytic properties of associated generating functions.

Fix a class of combinatorial structures we want to sample. Under the Boltzmann model we assign to each object a probability measure

 (14) Px(α)=x|α|A(x)

where is some aptly chosen, positive real-valued control parameter, and denotes the size of object . If we sample in accordance with the above Boltzmann distribution, outcome structures of equal size invariably occur with the same probability. Boltzmann models retain therefore the usual requirement of uniformity however now, the size of the outcome object is no longer fixed, but instead varies. Indeed, let

be a random variable denoting the outcome size of a

Boltzmann sampler generating objects according to (14). Summing over all of size we find that

 (15) Px(N=n)=anxnA(x).

We can control the expected size or generated objects by choosing suitable values of the control parameter

. The expected average and standard deviation of

satisfy, respectively,

 (16) Ex(N)=xA′(x)A(x)andσx(N)=√xE′(N).

Suppose, for instance, that we want to design a Boltzmann model for plane binary trees. Recall that the associated generating function satisfies the relation . If we want to centre the mean output tree size around some fixed value , we need to find a suitable control parameter such that . A direct computations reveals that we should use

 (17) x=n(n+1)(2n+1)2.

So, if we want to obtain a random tree in some admissible size window with tolerance , we calibrate according to (17) and reject trees falling outside of the prescribed size window.

Boltzmann samplers. For many interesting instances of decomposable combinatorial classes, such as rational languages or algebraic data types, it is possible to automatically design appropriate Boltzmann models and corresponding samplers generating random objects in accordance with the underlying Boltzmann distribution, see [Duchon:2004:BSR:1024662.1024669, Canou:2009:FSR:1596627.1596637, doi:10.1137/1.9781611975062.9, PIVOTEAU20121711].

Much like exact-size recursive samplers, the Boltzmann sampler layout follows the inductive structure of the target specification. Suppose that we want to sample an object from a (disjoint) union class . According to the Boltzmann distribution (14) the probability of sampling an object from is equal to . Likewise, the probability of sampling an object from is equal to . The corresponding Boltzmann sampler for performs therefore a single Bernoulli trial with parameter . If successful, the sampler invokes . Otherwise, it calls the remaining sampler . Clearly, such a sampler conforms with the Boltzmann distribution.

Now, suppose that we want to sample an object from a product class . According to the Boltzmann distribution, the probability measure of a product object satisfies

 (18) Px(α)=x|α|A(x)=x|β|+|γ|A(x)=x|β|x|γ|A(x)=Px(β)⋅Px(γ).

Therefore, in order to sample an object from we can independently invoke samplers and , collect their outcomes, and construct a pair out of them. The result is a random object from .

###### Remark .

The disjoint union and Cartesian product suffice to design Boltzmann samplers for algebraic specifications, in particular unambiguous context-free grammars. Let us remark, however, that Boltzmann sampler are not just limited to these two operations. Over the years, efficient Boltzmann samplers for both labelled and unlabelled structures were developed. Let us just mention Boltzmann samplers for various Pólya structures [Flajolet:2007:BSU:2791135.2791140], first-order differential specifications [BODINI20122563], labelled planar graphs [fusy2005quadratic], or plane partitions [bodini2010random].

In most cases, especially when optimised using anticipated rejection [BodGenRo2015], Boltzmann samplers are remarkably efficient, frequently generating random structures of sizes in the millions. For typical algebraic specifications, Boltzmann samplers calibrated to the target size window admit, on average, a linear time complexity under the real arithmetic computation model [Duchon:2004:BSR:1024662.1024669, BodGenRo2015]. For exact-size sampling where the tolerance , the complexity becomes .

###### Remark .

Boltzmann samplers for -terms were first developed by Lescanne [DBLP:journals/corr/Lescanne14, grygiel_lescanne_2015] for Tromp’s binary -calculus [tromp:DSP:2006:628]. Tromp’s special encoding of (plain) lambda terms as binary strings, provided an unambiguous context-free specification fitting the framework of Boltzmann samplers. With their help, it became possible to generate -terms a couple of orders of magnitude larger than using recursive method.

Boltzmann models, especially if used with rejection methods, provide a rich source of random -terms. Although quite general, Boltzmann samplers cannot be used for all term representations and size notions. Recall that in order to derive a corresponding Boltzmann model, we have to, inter alia, evaluate the associated generating functions at the calibration parameter . For some representations, such as for instance Wang’s closed -term model [Wang05generatingrandom], the associated generating function is not analytic around the complex plane origin333In such a case the corresponding counting sequence grows super-exponentially fast, i.e. asymptotically faster than any exponential function .. Moreover, without a finite specification we cannot derive a finite set of Boltzmann samplers. In such cases we have to resort to alternative sampling methods including, e.g. rejection-based samplers.

### 2.5. Rejection samplers

Most interesting subclasses of -terms, such as closed or (simply) typeable terms, have no known finite, admissible444Purely in terms of basic constructions like the disjoint sum or Cartesian product, cf. [flajolet09, Part A. Symbolic Methods]. specification which would allow us to derive effective Boltzmann samplers. In order to sample from such classes, we approximate them using finite, admissible specifications, using rejection methods if needed.

Closed -terms. Most -term representations, such as [Bodini2018, lmcs:848, 10.1093/logcom/exx018] admit a common, general symbolic specification template, differing only in weights assigned to specific constructors (i.e. variables, applications, and abstractions). Let , , , , and denote the class of (open or closed) -terms, free and bound variables, a single application node, and a single abstraction node, respectively. Then, admits the following specification:

 (19) L=F+(A×L2)+U×subs(F↦F+D,L).

In words, a -term is either a free variable in , an application of two terms joined by an application node, or an abstraction followed by a lambda term in which some free variables become bound by the head abstraction (i.e. are substituted by some bound variables in ). Since not all free variables have to be bound by the topmost abstraction, free variables can also be substituted for variables in .

Let denote the set of free variables in . Consider the following bivariate generating function defined as

 (20) L(z,f)=∑T∈Lz|T|f|FV(T)|.

sums over all terms using variable to account for s size, and variable to account for the number of its free variables. For instance, consider the combinator . If we assume that (one size unit for each abstraction, applications, and variable), then its respective monomial in becomes .

If we group matching monomials we note the coefficient stands for the number of terms of size and exactly free variables. In particular, the coefficient corresponds to the number of closed -terms of size . We can also look at whole series associated with specific powers. For instance, corresponds to the univariate generating function enumerating closed -terms. In general, is the generating function corresponding to -terms with exactly free variables.

Unfortunately, by (19) the series depends on which, in turn, depends on , etc. Note that each new abstraction can bind occurrences of a variable. Consequently, the abstraction body can have one more free variable than the whole term. Due to this apparent infinite inductive nesting, finding the generating function for closed terms, as well as generating respective terms using Boltzmann models, is as difficult as the general case .

###### Remark .

Despite the inadmissible specification (19) it is still possible to use recursive samplers for closed -terms. The number of free variables is bounded by the overall term size and, in particular, related to the number of term applications. Hence, for target term size we essentially need to memorise all of the values for .

Still, such a method allows us to sample terms of relatively small sizes, especially if the involved numbers grow too fast, cf. [Bodini2018, Wang05generatingrandom]. Consequently, the standard practice is to restrict the class of generated terms, using practically justifiable criteria, such as limiting their unary height, number of abstractions, or the maximal allowed De Bruijn index, see e.g. [BODINI201845, Bodini2018, BendkowskiThesis]. In all these cases, the infinite inductive nesting of parameters becomes finite, and so it is possible to use more efficient sampling techniques, such as Boltzmann sampling.

Under certain term representations, it is possible to sample large, unrestricted closed -terms, despite the infinite nesting problem. Let us consider the class of -terms in the De Bruijn notation. For convenience, assume the natural size notion in which each constructor (including the successor and zero) weights one [10.1093/logcom/exx018]. Let denote the corresponding generating function for plain (open or closed) terms and denote the generating function corresponding to so-called -open -terms, i.e. terms which when prepended with head abstractions become closed. Note that if a term is -open, then it is also -open.

With the help of we can rewrite (19) into a more verbose, infinite specification exhibiting the precise relations among various openness levels:

 (21) L0(z)=zL0(z)2+zL1(z)L1(z)=zL1(z)2+zL2(z)+zL2(z)=zL2(z)2+zL3(z)+z+z2…Lm(z)=zLm(z)2+zLm+1(z)+z1−zm1−z…

In words, if a term is -open, then it is either an application of two -open terms, accounted for in the expression , an abstraction followed by an -term, accounted for in the expression or, finally, one of available indices . Note that and so . Hence the final expression .

Although infinite, such a specification can be effectively analysed using recently developed analytic techniques [BODINI201845, DBLP:journals/corr/abs-1805-09419]. As a by-product of this analysis, it is possible to devise a simple and highly effective rejection-based sampling scheme for closed -terms. Consider the following truncated variant of (21):

 (22) L0,N(z)=zL0,N(z)2+zL1,N(z)L1,N(z)=zL1,N(z)2+zL2,N(z)+zL2,N(z)=zL2,N(z)2+zL3,N(z)+z+z2…LN,N(z)=zLN,N(z)2+zL(z)+z1−zN1−zL(z)=zL(z)2+zL(z)+z1−z

Note that instead of unfolding the definition of indefinitely, we stop at level and use the generating function corresponding to all -terms at the final level . Such a finite specification defines a subclass of plain -terms excluding open terms without sufficiently large index values. In particular, this class of terms approximates quite well the set of closed terms.

Following [BODINI201845] we use (22) to sample plain -terms and reject those which are not closed. The asymptotic number of such terms approaches the asymptotic number of closed terms exponentially fast as tends to infinity. In consequence, the imposed rejection does not influence the linear time complexity of the sampler drawing objects from some interval . In fact, with the probability of rejection is already of order , cf. [BODINI201845, Section 5.4]. Consequently, such a sampler rarely needs to dismiss generated terms and is able to generate large random closed -terms of sizes in the millions.

###### Remark .

The discussed sampling method works for a broad class of size notions within the De Bruijn term representation [GittenbergerGolebiewskiG16]. It is, however, specific to the unary encoding of De Bruijn indices.

The intuitive reason behind the success of the above rejection sampler lies is the fact that random plain terms strongly resemble closed ones. Typically, large random terms are almost closed. For instance, on average, a random term contains a constant number of free variables, and needs just a few additional head abstractions to become closed [DBLP:journals/corr/abs-1805-09419].

Typeable -terms. Much like closed -terms, simply-typed terms do not admit a finite, admissible specification from which we could derive efficient samplers. Simple recursive samplers work, however cease to be effective already for small term sizes, cf. Section 2.1. Sampling simply-typed terms seems notoriously more challenging than sampling closed ones. Even rejection sampling, whenever applicable, admits serious limitations due to the imminent asymptotic sparsity problem — asymptotically almost no term, be it either plain or closed, is at the same time (simply) typeable. This problem is not only intrinsic to De Bruijn models, see [10.1007/978-3-662-49192-8_15], but also seems to hold in models in which variables contribute constant weight to the term size, see e.g. [lmcs:848, Bodini2018].

Asymptotic sparsity of simply-typed

-terms is an impenetrable barrier to rejection sampling techniques. As the term size tends to infinity, so does the induced rejection overhead. In order to postpone this inevitable obstacle, it is possible to use dedicated mechanisms interrupting the sampler as soon as it is clear that the partially generated term cannot be extended to a typeable one. The current state-of-the-art samplers take this approach, combining Boltzmann models with modern logic programming execution engines backed by highly-optimised unification algorithms

[bendkowski_grygiel_tarau_2018]. Nonetheless, even with these sophisticated optimisations, such samplers are not likely to generate terms of sizes larger than one hundred.

###### Remark .

For some restricted classes of typeable terms there exist effective sampling methods, such as the already mentioned samplers for linear and affine terms [BODINI2013227]. Since Boltzmann models support all kinds of unambiguous context-free languages, it is also possible to sample other fragments of minimal intuitionistic logic, for instance, using finite truncations à la Takahashi et al. [TAKAHASHI1996144].

### 2.6. Multiparametric samplers

For some practical applications, especially in industrial property-based software testing 

, the uniform distribution of outcome structures might not be the most useful choice. In fact, it can be argued that most software bugs are miniscule, neglected corner cases, which will not be caught using large, typical instances of random data, see

[Palka:2011:TOC:1982595.1982615, Runciman:2008:SLS:1411286.1411292].

Remarkably, combinatorial samplers based on either Boltzmann models or the recursive method, can be effectively tuned so to distort the intrinsic outcome distribution [doi:10.1137/1.9781611975062.9]. For instance, consider again the class of -terms in the De Bruijn notation with natural size. The corresponding class of De Bruijn indices satisfies

 (23) D=Z×\sc Seq(Z).

In words, a De Bruijn index is a (possibly empty) sequence of successors applied to zero. Both each successors and zero contribute weight one to the overall term size.

Such a specification inevitably dictates a geometric distribution of index values and, consequently, an average constant distance between encoded variables and their binders

[DBLP:journals/corr/abs-1805-09419]. If such a distribution is undesired, we can, for instance, force the sampler to output terms according to a slightly more uniform distribution. Suppose that we change the De Bruijn index specification to the more verbose

 (24) D=U0Z+U1Z2+⋯+UkZk+1+Zk+2\sc Seq% (Z)

assigning marking variables to the initial distinct indices , leaving the remaining indices unmarked. In doing so, we end up with a multiparametric specification for -terms where , as usual, marks the term size, and mark some selected indices.

This new multivariate specification can be effectively tuned in such a way, that branching probabilities governing the sampler’s decisions impose a different, non-geometric distribution of index values. For that purpose the problem is effectively expressed as a convex optimisation problem. With the seminal advancements in interior-point methods for convex programming [nesterov1994interior] the time required to compute the branching probabilities is proportional to a polynomial involving the optimisation problem size and the number of variables.

To illustrate the distortion effect, suppose that we mark the first nine indices and impose a uniform distribution of of the total term size among them. Remaining indices, are left unaltered. Table 1, taken from [DBLP:journals/corr/abs-1805-09419], provides some empirical frequencies obtained using a tuned sampler and its regular, undistorted counterpart. Specific values were obtained for terms of size around .

###### Remark .

Multiparametric tuning requires a few mild technical assumptions, such as the feasibility of requested tuning (e.g. we cannot ask for the impossible). It can also be used for exact-size sampling or even more involved admissible constructions, including Pólya structures, labelled sets, etc

. The tuning procedure can be carried out automatically and therefore used to rigorously control the sample distribution without resorting to heuristics, or in-depth expertise of the user.

###### Remark .

Boltzmann samplers provide an incredibly versatile and efficient framework for combinatorial generation. Despite the underlying use of heavy theories such as analytic combinatorics and complex analysis, their design and usage follow simple, recursive rules resembling a small, dedicated domain-specific language. With effective tools of their design and tuning, Boltzmann samplers can be automatically compiled without additional expertise in analytic combinatorics or complex analysis [Duchon:2004:BSR:1024662.1024669, Canou:2009:FSR:1596627.1596637, doi:10.1137/1.9781611975062.9, PIVOTEAU20121711]. We refer the more code-minded reader to concrete implementations.

Let us note that analytic methods of (multivariate) Boltzmann sampling are not the only rigorous methods of generating random algebraic data types or -terms in particular. While other methods exists, for instance branching processes of Mista et al. [mista], analytic methods seem to support the generation of far larger structures.

## 3. Conclusion

Random generation of -terms is a young and active research topic on the border of software testing, theoretical computer science and combinatorics. It combines utmost practical experimentation and theoretical advances in analytic combinatorics. Still, many important problems remain open. For instance, how to deal with representations in which corresponding generating functions cease to be analytic, or how to generate large random simply-typed terms?

### Acknowledgements

We would like to thank Pierre Lescanne and Sergey Dovgal for encouraging us to write this survey and their valuable remarks during its formation.