1 The quest for linear-time Boltzmann sampling
We continue our study of the so-called “Boltzmann” Exact Sampling Algorithm , a wide class of algorithms which allow to sample random combinatorial structures of size from a given measure, in an average time that scales algebraically with . This algorithm has received many praises since its appearence in 2004, and it is fair to state that nowadays constitutes a small branch by its own with the Theory of Algorithms. It exists in various incarnations, such as labeled  and unlabeled  combinatorial structures, structures defined through differential specifications  and multi-parametric extensions . In this paper we will mostly concentrate on the ‘original’ case, discussed in , of labelled structures.
The structures to which this family of algorithms mainly apply are described in detail in the Flajolet and Sedgewick monograph on Analytic Combinatorics , where a large stress is given to generating-function techniques and saddle-point methods for the asymptotic enumeration of the configurations. In short, the Boltzmann Algorithm explores the possibility of translating the informations implicit in this analysis into an efficient exact-sampling algorithm. Say that we have combinatorial objects , and objects of size have measure , with support . In most cases, a one-parameter family of measures exist, , which takes the form , and is ‘natural’ in the sense that it is the one implicit in the construction of the generating function for the objects (where and are easily related). In other words, the measures and are the canonical and grand-canonical Boltzmann–Gibbs measures, at energy and inverse-temperature , for the statistical ensemble consisting of the combinatorial objects, and the name ‘Boltzmann Method’ is a tribute to the role of Ludwig Boltzmann in the foundations of Statistical Mechanics, whose ideas are of inspiration for the method. The recursive description of implicit in the combinatorial specification translates into a linear-time algorithm for sampling from , which thus induces an algorithm for sampling from , with complexity .
Essentially in all the cases of interest for us, the Shannon entropy of the measure , defined as , scales linearly with , and, as well known, this provides a lower bound to the complexity for exact sampling, because any algorithm needs to sample on average at least random bits, and the cost of a random bit is of order 1. (Note that, if the measure is the uniform measure, as in most of the concrete applications, then , and if , then .) We say that an algorithm is optimal up to a multiplicative constant111This is a different, weaker notion w.r.t. optimality tout court, which, in the framework of exact-sampling algorithms, corresponds to optimality in the average number of random bits, up to subleading corrections, and a time complexity for other operations which is not larger than the complexity for sampling the required random bits. This version of optimality is the ‘ultimate goal’ of exact sampling, but it is rather exceptional, and, in our opinion, presently beyond reach at the level of generality of the present paper. Among the few successful examples obtained so far we mention [6, 7, 8, 9] for various families of trees and walks, [10, 11, 12] for random permutations,  for “linear extensions in series-parallel posets” among others. if it has average complexity bounded from above by , for some constant . Unfortunately, the Boltzmann algorithm as is is not optimal, as normally scales as , or, at best, as . Thus, the average complexity scales as for some . For this reason we have started to explore under which circumstances we can improve the ideas of Boltzmann sampling, up to (possibly) reach optimality.
In a first paper , we devise a general method that applies to all the cases in which an object of size can be decomposed canonically as the union of two parts, , both of size . When this is the case, an improvement of the naïve Boltzmann method for sampling from the Hadamard product of two distributions allows to decrease the complexity from to . In our opinion, this is already an interesting result, as it is rather general, however it does not cover all the applications of Boltzmann sampling (as the forementioned canonical decomposition does not always exist), and, most importantly, still does not reach optimality.
In a second set of works [15, 16], we describe how to exactly sample, in linear time, combinatorial structures which (up to bijections) are described by random bridges (i.e., random lattice walks from to , with steps ), even when the distributions of the steps are not all equal, in particular (as described in ) provided that they satisfy a property that we call positive decomposition. We show that this technical constraint is not very restrictive, as the realm of applications includes, among other things, important combinatorial classes such as random set-partitions, that is partitions of a set of elements into non-empty subsets, counted by Stirling numbers of the second kind and related to the exact sampling of minimal automata , and its ‘dual’ problem, of random permutations of size with cycles, counted by Stirling numbers of the first kind, for which, to our knowledge, no linear-time sampling algorithms were previously available. However, admittedly, the problems solved by this algorithm are just a relatively small subclass of all those that are addressed by the Boltzmann method, so a quest for a new idea that could tackle very large classes of combinatorial objects still stands.
Let us put the Boltzmann Method and its variants more in context. Of course, this method is not the only available exact-sampling algorithm for combinatorial structures. One relatively ancient algorithm is due to Nijenhuis and Wilf  (with later modifications by Flajolet, Zimmermann and Van Cutsem , see also ). If the exact enumeration of certain combinatorial objects is described by a system of equations, of the form
where the coefficients are positive-valued functions (for their range of arguments), that can be evaluated as floats with digits in time , then, for being sets of indices appearing as left factors of the monomials above, the quantities , can be evaluated recursively up to the size of interest, once and for all in a ‘preprocessing phase’, in a time of order , and the final data structure occupies a space of order . At this point, a simple divide-and-conquer algorithm (with complexity improved by the so-called “boustrophedon method”) allows to perform exact sampling in average time of order . So, if one is interested in exact sampling with the aim of performing a statistical average, when the number of samples is much larger than the size of the objects (as is often the case), and allows for extra logarithmic factors, even this “brute-force counting” strategy, once improved by the expedients outlined above, may be quasi-optimal.
In fact, when the system of equations is linear (as in the broad
family of regular languages), the setting of this algorithm
simplifies drastically, and a careful implementation of these ideas
allows to achieve linear complexity even for the single-run sampling,
that is, with an algorithm in which also the preprocessing is
quasi-linear [21, 22].222 Note that in this case
optimality is reached by carefully dealing with floating-point
arithmetics, something which is doable and nowadays well-understood,
however painful for what concerns algorithmic validation of the
results, and, for “purists” of Complexity Theory, less elegant
than a purely combinatorial algorithm. More precisely, we say that
an algorithm ‘uses floats’ if it requires the high-precision
evaluation of quantities that also depend on the size, like for
example, for a given random number
Note that in this case optimality is reached by carefully dealing with floating-point arithmetics, something which is doable and nowadays well-understood, however painful for what concerns algorithmic validation of the results, and, for “purists” of Complexity Theory, less elegant than a purely combinatorial algorithm. More precisely, we say that an algorithm ‘uses floats’ if it requires the high-precision evaluation of quantities that also depend on the size, like for example, for a given random number, the largest integer such that , while we consider ‘legitimate’ the evaluation, once and for all in the preprocessing phase, of a finite number of parameters, depending only on the combinatorial structure at hand, and not on the goal size, or also the use of potentially complicated quantities, such as , as acceptance rates in an algorithm, as in this case on average we need to certify only binary digits of the quantity. So, we should focus on combinatorial structures whose generating functions are determined by non-linear systems, which, indeed, are in general more complicated classes (just like, in Algebra, non-linear systems are more complicated than linear systems). Within these classes, there is an important family, in which the coefficients above are in fact constants, depending only on , and only finitely many are non-zero. In this case one says that the corresponding structures are context-free. As explained in [5, Sec. VII.1–6], under suitable hypotheses, the crucial Drmota–Lalley–Woods Theorem (DLW) [23, 24, 25] applies, and these structures fall under the so-called smooth inverse-function schema (SIFS), that is the generating function has a peculiar square-root singularity at the critical point, and the enumeration has asymptotics of the form , where the exponent is ‘universal’, that is, it is shared by all classes within the SIFS.
This family contains examples ranging from simple cases, such as
e.g. binary and unary-binary rooted planar trees, to rather
complicated ones, such as two-terminal planar graphs with no
minor (a class of graphs included in the set of planar graphs, and
including series-parallel graphs).
Indeed, quoting [5, pg. 443]:
there is a progression in the complexity of the schemas leading to square-root singularity. From the analytic standpoint, this can be roughly rendered by a chain [ inverse functions implicit functions systems ]. It is, however, often meaningful to treat each combinatorial problem at its minimal level of generality
This hierarchy of difficulty seems to stand also at the level of the Boltzmann Method. In fact, the simplest case of the SIFS corresponds to simple varieties of trees and inverse functions in , and is analysed in Section VII.3 there. In this situation, there exists an algorithm, due to Devroye , that achieves (quasi-)optimality,333I.e., it may have extra logarithmic factors, when bit complexity is taken into account. by considering a classical bijection with Łukasiewicz paths, using the cyclic lemma, and exploiting the exchangeability of the steps for the corresponding bridges. This algorithm has the small flaw of involving floating-point arithmetics, but it has the important merit of showing, for the first time, that linear-time average complexity can be achieved for ‘hard’ (i.e., non-linear) combinatorial structures. Also, when the step weights allow for a positive decomposition, our paper  provides a different algorithm, that avoids floating-point arithmetics (and the complicancies of dealing rigorously with it at the level of programming).
However, there are two further steps in the complexity scale of the SIFS, namely what is called tree-like structures and implicit functions in , and treated in Section VII.4 there, and what is called irreducible context-free structures in , and treated in Section VII.6 there, which is mostly devoted to a discussion of the forementioned Drmota–Lalley–Woods Theorem. In this last case, planar trees are replaced by a coloured variant (with as many colours as equations in the system, plus one colour for the leaves), and the natural bijection with lattice walks (i.e., the one induced by the depth-first search countour of the tree) gives now walks with coloured steps, and complicated non-local correlations between the various steps. As a result, exchangeability is broken at the level of the single steps, and the whole Devroye strategy cannot be applied, without being complemented by some new idea. This fact is also evidentiated in the original Devroye paper , which explains clearly to which situations his agorithm applies (and, implicitly, to which situations his agorithm does not apply).
The goal of this paper is to provide an average linear-time algorithm, variant of the Boltzmann sampling method, that works in the broader setting of irreducible context-free structures, thus performing “two leaps forward in one stroke”, on the complexity scale of the SIFS, in the program of making the Boltzmann Method linear. Note however that more complicated, non–context-free non-linear classes, such as what is called ordinary differential equations and systems in , and treated in Section VII.9 there, are still not covered by the treatment of this paper, despite the fact that, as mentioned above, some linear non–context-free problems are solved by our  (we hope to address this level of generality in future work).
Our main idea is to decompose the combinatorial object , in order to extract one family of degrees of freedom which are specially simple, and postpone the sampling of these degrees of freedom after the evaluation of the acceptance rate. By some magics that we try to elucidate in Section 2 on a simple example (and discuss in full length in Section 8), this allows to gain the factor , and reach optimality. Ideas of this sort are not completely new (for example, a version of this appears in our [15, 16], and another version appears in ), but are used here in a different twist, that requires a number of subtle tweaks that we try to introduce here in a pedagogical way.
The paper is organised as follows: besides the simple motivational example of Section 2, in Section 3 we discuss some (more or less) well-known facts in Analytic Combinatorics, concerning irreducible context-free structures. This section is complemented by Section 4, that is mainly devoted to examples. Sections 5 and 6 discuss some preliminary aspects of our algorithm (more of combinatorial flavour in the first section, and of analytical flavour in the second one). Finally, Section 7 describes the structure of the algorithm, and the functional form of the involved quantities, while Section 8 describes how to optimise the parameters, in such a way to reach a certification of optimal complexity.
Section 6 is complemented by an appendix that discusses some facts in Perron–Frobenius Theory (which, as well known, has a crucia role in the DLW Theorem). A second appendix provides a reminder of the BalancedShuffle algorithm of Bacher, Bodini, Hollender and Lumbroso , which is used as a black box within our algorithm. A third appendix collects some technicalities required for the certification of optimality discussed in Section 8.
2 A simple example
As a warm-up before introducing our full-fledged algorithm, let us consider the exact sampling of lattice walks from to , with steps , with . Steps with come with a factor 2, that is, calling a walk, and , and the number of steps in , respectively, we have
The weight has been chosen in order to have a trivial normalisation: walks of this sort are just ‘walks of length in disguise’: if is a walk from to with steps , we obtain a map from walks to by just taking steps in pairs, and the factor is nothing but the number of preimages under this map. As a result, we have the more explicit
and more generally
As a result, a simple divide-and-conquer algorithm allows to sample these walks in linear time. We grow the walk step by step. Say that the concatenation of the first steps has reached the position . Then we must continue with a step or
with probabilities, that is, given by the triple of rational functions
Keeping probabilities as first approximation, and refining the evaluation only if the sampling procedure requires it, allows to avoid spurious logarithmic factors in bit complexity.
So, in this case we have no need of inventing a new algorithm. However, it is instructive to see other algorithms at work here, where all the calculations can be performed explicitly, and a number of subtleties are not required, before trying to generalising their ideas to more complicated situations.
Before doing this, we shall remind that, for every of order 1, and every with , it is possible to sample uniformly random shuffles of the string
(that is, permutations in ) in a time linear in . If we allow for an extra factor in the complexity, this can be done just by sampling a random permutation, e.g. with the classical Fisher–Yates algorithm [28, 29]. Random-bit optimality (that is, using random bits) is reached by the BalancedShuffle algorithm of Bacher, Bodini, Hollender and Lumbroso . We will need uniform random shuffles repeatedly in the following, where it will be understood that ‘BBHL shuffling’ refers to this algorithm, and is the corresponding sampler.
Boltzmann Method. Let us see how the ordinary Boltzmann sampling would work in this case. Call the measure , . We just have:
Each run costs (and exactly random bits on average), and the probability that a run is accepted is . So the overall complexity is of order , as anticipated.
Devroye Method. The method described in , specialised to this case, works as follows. First, we sample admissible triples
, with the appropriate probability distribution. As we must have, admissible triples have the form for , and the probability distribution is
Then, we perform a random shuffle of the string consisting of symbols , followed by zeroes, and by ’s:
Sampling from is complicated but feasible in sublinear time (it is easily done in average time , by sampling uniformly in , one digit at the time as long as they are needed, calculating once and for all as a high-precision float, and summing up the ’s, in order of distance from , up to reach the threshold , which is done quickly as the ratio is a simple rational function and w.h.p. we need summands). The rest of the algorithm requires no other subtlety, and takes linear time. So this algorithm is optimal.
Accelerated Boltzmann Method. Let us now see how our acceleration method improves the complexity of Boltzmann sampling. Call the measure , . The structure of our algorithm is as follows:
In words, we sample a random walk, with steps and , and probabilities and (thus with average slope ), up to reach or jump over the line passing through with slope . The probability of jumping over is roughly (with exponentially small corrections).444Because we jump over the -th diagonal if we are in , and we take a step, so the probability of not occupying a diagonal satisfies the steady-state equation , that gives . The most important fact is that, assuming that we reached the line at position , we keep what we have constructed so far with a suitable acceptance rate , and restart otherwise. Once this acceptance step has been verified, we just sample a random shuffle, for shuffling the set of steps and altoghether with the set of steps , and we are done.
So, our algorithm is optimal up to a multiplicative constant if we can determine a function such that:
the algorithm samples according to the desired measure;
for all and ;
the function can be calculated efficiently, i.e., in time at most ;
Let us first address the most obvious constraint: that we are sampling the desired measure. We do this by analysing the probability of getting any given output string , with . We have a factor for sampling the walk up to the line, then a factor for sampling the unique shuffle that produces the string under investigation, and finally we have the acceptance rate . The resulting product must be proportional to . This gives the equation
for some . That is,
Then, we have to choose as large as possible (in order to have good hopes on the fourth condition), while satisfying the second and third condition, that is, we have to choose as large as possible, while keeping it easy to evaluate, and certified to be smaller than . For all , the sequence is log-concave, so it has a unique maximum, at the value where and . As we have , this gives . So we can choose
Typical values of are of order , thus calculating the quantity above, as a -digit float, by calculating the corresponding Pochhammer functions, takes on average . Of course, we can do much better. Recalling the Robbins bound on factorials 
we can determine if a random uniform number in is larger or smaller than the acceptance rate above, in time , with probability , and then for the remaining probability we can perform the product in the way outlined above, this giving overall complexity .
So, we are ready to address the one and only subtle point, which shall illustrate the reason of the acceleration. That is, we shall understand why , while for ordinary Boltzmann , just as a result of the fact that we have sampled a 0/+1 random walk up to the line with slope passing through , instead of sampling a 1/0/+1 random walk up to the vertical line passing through .
In fact, we would have had the very same complexity of ordinary Boltzmann if we did perform the ‘wrong’ naïve choice (which, by the way, is manifestly bounded by 1 because ). However, we could push up the acceptance rate, by a factor that is the inverse of the maximum over of the naïve function, and this quantity indeed is of order .
For what concerns the evaluation of , we could just use the CLT for showing that
is asymptotically normal distributed, and then check thatis also asymptotically Gaussian, scaled not as to be normalised, but rather as to have maximum value 1. That is, roughly,
where , the first Gaussian is the approximation of the probability of reaching , and the second Gaussian is the acceptance rate. This CLT principle applies in general. Furthermore, in our simple example we can perform the calculations explicitly, as we have
The denominator is nothing but , which, incidentally, is also the probability of reaching the -th diagonal. The numerator is the slightly more complicated expression . Note that there are three factorials in the numerator, and three in the denominator, so, as , there are no factors coming from Stirling approximation, and indeed, for large the numerator converges to , with corrections of order . That is, combining numerator and denominator, , which in turns, multiplying by the probability of reaching the line instead of jumping over (or, equivalently, omitting the denominator), gives on average tries for accepting a run of the algorithm. This completes our proof, and (given the optimality of BBHL shuffling, and of sampling i.i.d. random values from ) tells us that our algorithm has average bit complexity .
3 Irreducible context-free structures and coloured trees
As explained in detail in the Analytic Combinatorics book , several interesting combinatorial structures admit a recursive context-free definition, that is, for every structure , weighted with the measure of choice , one can choose canonically two integers and , such that is decomposed into “atoms” and sub-structures , …, , with and . In the symbolic framework described in  (generalised in the natural way for dealing with weighted objects instead of just counting, see e.g. ), this reads
for real non-negative, and, at the level of generating functions,
The ’s must satisfy certain technical conditions, mostly of summability (no fat tails), and non-triviality. Calling , the equation above reads
This is the situation called tree-like structures and implicit functions in . Many concrete examples are of the form , that is, every decomposition involves a single extra elementary node. This is the case of simply-generated (rooted planar) trees, when each node counts as an unit, and of Łukasiewicz excursions [5, p. 74], that is lattice paths in the upper-half plane with steps of the form for (these two families are in simple bijection, where the path describes the depth-first search countour of the tree). This is the special case called simple varieties of trees and inverse functions in , and, for what concerns exact sampling, if is a polynomial, is covered by Devroye algorithm, while more general paths or trees (for example paths in the upper-half plane with steps in and ) are in the more general framework (in the example, with ).
An even more general framework is one in which we have more than one (but finitely many, say ) types of combinatorial structures , with . Again, for every structure of type , weighted with the measure of choice, one can choose canonically integers, and , , …, , such that is decomposed into “atoms” and structures , …, , for all , with and . In the symbolic framework described in , this reads
and, at the level of generating functions,
Calling and , and introducing the functions , the equation above is just the natural multi-component version of (13), that is
), the combinatorial class to which we are interested is the one represented by the first component of our vector.
Just like equations of the form describe trees counted with the number of nodes, and more generally equations of the form can be related to trees where internal nodes and leaves are distinguished, counted with the number of leaves,555In order to have finitely many configurations for each given size, we require that no unary node can have an internal child, i.e., that has no monomial . configurations associated to a context-free structure can be put in bijection with certain ‘weighted coloured trees’, in which the size is the number of leaves, and the internal nodes can be ‘coloured’ with the indices from to , and an internal node of colour , with children of colour and children leaves comes with a factor in the weight.666Now, in order to have finitely many configurations for each given size, we require that the linear part of is a nilpotent matrix.
The specification given by the system can be interpreted as a stochastic rewriting rule, of which every trajectory can be translated into a coloured tree (see Figure 1). The process is parametrised by a solution of the system . It starts with a node of label 0 at the root, which is the only node in a stack of ‘boundary nodes’. Then, for a node in the stack with label , we choose the composition of its offspring according to the probability distribution
(Note that, indeed, these probabilities are normalised.) Then, the node leaves the stack, and all of its non-leaf descendents are put in the stack. We continue the process, picking up nodes from the stack, e.g. in random order or in a breath-first search. The process stops when the stack is empty (or doesn’t stop at all). The order of the operations does not affect the probability distribution of the outcome, as long as it is guaranteed that, for each finite height , almost surely every node in the stack, at height in the tree, is processed at some point, and in particular this is the case when the process stops almost surely. The resulting process, besides the minor complicancies coming from the colouring, is essentially a Galton–Watson (GW) process. In particular, it is well known that we have a critical Galton–Watson process whenever these parameters correspond to the solution of the characteristic system [5, pg. 483], i.e., the solution to the system of equations
in with smallest value of , while we have a subcritical GW process whenever the spectrum of the matrix
is strictly contained in the disk of radius 1 (or, equivalently, the Frobenius eigenvalue of the matrixis strictly smaller than ). Indeed, the Frobenius eigenvalue of corresponds to the average number of children, in the GW process in which the nodes of the stack are taken randomly. Subcritical GW processes lead to a probability distribution on the extensive parameters of the tree (such as the number of nodes of a given type) which has exponential tails, while, if the combinatorial system is also “irreducible”, critical GW processes lead to a probability distribution with tail , with determined by the DLW Theorem (the precise notion of irreducibility is presented in the context of this theorem, we refer to  for the details). Supercritical GW processes, associated to the case in which the Frobenius eigenvalue is larger than 1, lead to trees which have a finite probability of being of infinite size. In this case, the measure described by the parameters , and conditioned to produce finite trees, is well-defined, and interesting in several respects, however we do not need this notion in this paper, so we do not discuss further the supercritical case.
4 Examples of irreducible context-free structures
A nice example of context-free structure is the class of (two-terminal) series-parallel graphs, described for example in [5, p. 72, ex. I.46], and more in detail in . Another, slightly more complex example (but also more “typical”, as, contrarily to series-parallel graphs, does not have colourings alternating along the layers of the tree, and is irreducible and aperiodic), is the class of (two-terminal) graphs with no minor.777 is the wheel graph with five vertices. To our knowledge, this class has been discussed so far only in a seminar of ours, on the very same topic of this paper.888See https://library.cirm-math.fr/Record.htm?idlist=2&record=19286312124910045949, recorded during the meeting AofA: Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms, on June 24th, 2019, at the Centre International de Rencontres Mathématiques (CIRM), Marseille, France. Slides are available at https://www.cirm-math.fr/RepOrga/1940/Slides/Sportiello.pdf, and the system (19) is on page 65. Note that in this document there is a typo ( in place of ) in the excluded-minor description of the class of graphs. This class is described by the system of equations
where letters , , and have been chosen to denote “all”, “series”, “parallel” and “Wheatstone bridge” subclasses of these graphs. See Figure 2 for a description of the specification as a rewriting system (left), a typical example of configuration (bottom), and its representation as a coloured tree (right).
Strictly speaking, these classes are not context-free structures, because the functions are not polynomial. Nonetheless, they can be treated on a similar ground, as in fact the DLW Theorem has several extensions (see the discussion in [5, pg. 493], in particular the Remark VII.29).
Another example is constituted by recursive topological subdivisions of a rectangle by straight lines. This is a new model, somewhat pictorially similar to ‘quadtrees’ (cf. [5, Ex. VII.23, pg. 523]), and, if we trade rectangles for triangles, to ‘stack-triangulations’ . In this problem, every rectangle can be subdivided either horizontally, or vertically, or in both directions. However we impose a further condition, that in a sense avoids multiple countings of the same configuration: if we have divided a rectangle horizontally, we cannot divide horizontally any of the two resulting rectangles, and similarly for vertical subdivisions. See a typical example in Figure 4. The size of a configuration is the number of rectangles in the subdivision. If we call the generating function for these configurations, and (resp. ) the ones for rectangles that come from a horizontal (resp. vertical) subdivision, these functions satisfy the system of equations
which is illustrated by Figure 3. If we use the symmetry induced by (say) 90-degree rotations, we can identify the generating functions and , and reduce the system to two equations
We can further eliminate , to get the algebraic relation
that is, the inverse of the generating function, , is given by
The critical values are thus given as the roots of integer-valued polynomials (e.g., is determined by the equation ), identified by the constraint of being real-positive, and being the smallest real-positive root:
5 From trees to walks, and the cyclic lemma
From this point onward, we assume that our context-free structure is irreducible and aperiodic in the sense of the DLW Theorem. We also use as a synonim of , as this first component, besides being associated to the combinatorial class that we want to sample, plays a distinct role in the whole construction of the algorithm.
As we described above, the combinatorial structures we shall sample are in bijection with rooted planar trees , whose root index is , and which have leaves. The number of internal nodes is not fixed, however it is bounded by for some constant (as we have required that the linear part of is a nilpotent matrix).
Several algorithms for the exact sampling of trees, including the Devroye Algorithm among various others, make use of a bijection with suitable lattice paths, and the cyclic lemma. However, as we said above, this strategy does not apply immediately to coloured trees.
A better idea is to decompose the trees into subtrees, by breaking it at all the internal nodes with label . Each tree is thus described by a list of subtrees, , where a subtree has root index , leaves of two types ( and ), and all other internal nodes with labels in . We call the class of such subtrees having leaves with label , and leaves with label , and the class off all subtrees altogether. Let us adopt the shortcuts and , and . A set of necessary and sufficient conditions for a list to determine a tree is the following:
for all .
In other words, the concatenation of the vectors must constitute a generalisation of the notion of Łukasiewicz path, in which the steps can go forward by an amount different from . In lack of a name that has already been fixed in the literature, we will call them generalised Łukasiewicz excursions.
This construction has traded the complicancy of coloured steps, and non-local correlations, with the (comparatively minor) complicancy of steps with variable horizontal length. In particular, if we remove the constraint of remaining in the upper-half plane, the corresponding generalised Łukasiewicz walks have exchangeable steps.
If we call the partition function at size , i.e. , we have
and, rephrased in terms of lists of subtrees
At this point, we recall the Cyclic Lemma
Lemma 1 (Cyclic Lemma)
Let be a set of steps in , strictly contained in a half-plane . Let be a walk from the origin to the point . For , call the cyclic shifts of . If and are coprime, there exists exactly one index such that the walk is contained in the half-plane .
We shall call the operator such that, applied to a walk , gives the only cyclic shift of with the property above. An immediate and crucial consequence of this lemma (also based on the fact that our generalised Łukasiewicz excursions reach the point and all integers are coprime with ), is that we can rewrite equation (28) getting rid of the complicated condition (3):
We will call generalised Łukasiewicz bridges the corresponding family of directed lattice walks, that is, the same walks as the excursions, without the constraint of remaining in the upper half-plane. As a consequence, if we are able to exactly sample lists with the measure999Here denotes the Kronecker delta , and not the Dirac delta. This choice is based on the fact that here is replaced by large expressions, which would be poorly rendered as indices, and also there is no risk of confusions, as there are no Dirac delta’s in this paper.
with complexity , we are able to exactly sample our combinatorial structures with complexity .
In the remaining of this paper we will concentrate on the problem of exactly sample lists from the measure above. At this point, the pertinence of our toy-model problem of Section 2, dealing with the simplest possible family of lattice bridges which is not Dyck bridges (for which the problem reduces to random shhuffles), should be apparent.
6 Further analytic preliminaries
Having renamed our generating function , our system (16) reads
Similarly the generating function for the subtrees
is given by
where the ’s satisfy the reduced system
As a check, let us verify that the measure in (30) is indeed normalised. At this aim we should have, for all ,
where the integral is on a contour encircling the origin, and no other pole of the integrand. Or, in other words, taking the generating function for an arbitrary value of in the interval ,
However, the logarithm has a cut discontinuity on an interval starting from the origin, along the positive real axis, and this cut ends at the point such that . If we deform the countour of integration as to encircle this cut discontinuity, the integral is exactly (minus) the length of the interval (factors simplify exactly as in the classical proof of the Cauchy Theorem), so we get that our check is equivalent to the relation for all , whenever . Indeed, on the manifold where the latter holds, any solution of the bivariate reduced system is also a solution of the original univariate system, i.e. for all , and in particular for , so that our claim is verified.
There is a two-parameter family of natural measures on the set of possible subtrees , where the two parameters are the Lagrange multipliers for the number of - and -leaves: