# Polynomial tuning of multiparametric combinatorial samplers

Boltzmann samplers and the recursive method are prominent algorithmic frameworks for the approximate-size and exact-size random generation of large combinatorial structures, such as maps, tilings, RNA sequences or various tree-like structures. In their multiparametric variants, these samplers allow to control the profile of expected values corresponding to multiple combinatorial parameters. One can control, for instance, the number of leaves, profile of node degrees in trees or the number of certain subpatterns in strings. However, such a flexible control requires an additional non-trivial tuning procedure. In this paper, we propose an efficient polynomial-time, with respect to the number of tuned parameters, tuning algorithm based on convex optimisation techniques. Finally, we illustrate the efficiency of our approach using several applications of rational, algebraic and Pólya structures including polyomino tilings with prescribed tile frequencies, planar trees with a given specific node degree distribution, and weighted partitions.

## Authors

• 7 publications
• 10 publications
• 6 publications
• ### Tuning as convex optimisation: a polynomial tuner for multi-parametric combinatorial samplers

Combinatorial samplers are algorithmic schemes devised for the approxima...
02/26/2020 ∙ by Maciej Bendkowski, et al. ∙ 0

• ### Preferences Single-Peaked on a Tree: Multiwinner Elections and Structural Results

A preference profile is single-peaked on a tree if the candidate set can...
07/13/2020 ∙ by Dominik Peters, et al. ∙ 0

• ### Construction of Simplicial Complexes with Prescribed Degree-Size Sequences

We study the realizability of simplicial complexes with a given pair of ...
06/01/2021 ∙ by Tzu-Chi Yen, et al. ∙ 0

• ### Combinatorial and computational investigations of Neighbor-Joining bias

The Neighbor-Joining algorithm is a popular distance-based phylogenetic ...
07/18/2020 ∙ by Ruth Davidson, et al. ∙ 0

• ### Statistical properties of lambda terms

We present a quantitative, statistical analysis of random lambda terms i...
05/23/2018 ∙ by Maciej Bendkowski, et al. ∙ 0

• ### The degree profile and Gini index of random caterpillar trees

In this paper, we investigate the degree profile and Gini index of rando...
05/15/2018 ∙ by Panpan Zhang, et al. ∙ 0

• ### Sensor Selection by Linear Programming

We learn sensor trees from training data to minimize sensor acquisition ...
09/09/2015 ∙ by Joseph Wang, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Uniform random generation of combinatorial structures forms a prominent research area of computer science with multiple important applications ranging from automated software testing techniques, see [Claessen-2000], to complex simulations of large physical statistical models, see [BhaCouFahRan2017]

. Given a formal specification defining a set of combinatorial structures (for instance graphs, proteins or tree-like data structures) we are interested in their efficient random sampling ensuring the uniform distribution among all structures sharing the same size.

One of the earliest examples of a generic sampling template is Nijenhuis and Wilf’s recursive method [NijenhuisWilf1978] later systematised by Flajolet, Zimmermann and Van Cutsem [FLAJOLET19941]

. In this approach, the generation scheme is split into two stages – an initial preprocessing phase where recursive branching probabilities dictating subsequent sampler decisions are computed, and the proper sampling phase itself. Alas, in both phases the algorithm manipulates integers of size exponential in the target size

, turning its effective bit complexity to , compared to arithmetic operations required. Denise and Zimmermann reduced later the average-case bit complexity of the recursive method to in time and

in space using a certified floating-point arithmetic optimisation

[DenZimm99]. Regardless, worst-case space bit complexity remained as well as bit complexity for non-algebraic languages. Remarkably, for rational languages Bernardi and Giménez [Bernardi2012] recently linked the floating-point optimisation of Denise and Zimmermann with a specialised divide-and-conquer scheme reducing further the worst-case space bit complexity and the average-case time bit complexity to .

A somewhat relaxed, approximate-size setting of the initial generation problem was investigated by Duchon, Flajolet, Louchard and Schaeffer who proposed a universal sampler construction framework of so-called Boltzmann samplers [DuFlLoSc]. The key idea in their approach is to embed the generation scheme into the symbolic method of analytic combinatorics [flajolet09] and, in consequence, obtain an effective recursive sampling template for a wide range of existing combinatorial classes. In recent years, a series of important improvements was proposed for both unlabelled and Pólya structures. Let us mention for instance linear approximate-size (and quadratic exact-size) Boltzmann samplers for planar graphs [fusy2005quadratic], general-purpose samplers for unlabelled structures [flajolet2007boltzmann], efficient samplers for plane partitions [bodini2010random] or the cycle pointing operator for Pólya structures [bodirsky2011boltzmann]. Moreover, the framework was generalised onto differential specifications [bodini2012boltzmann, bodini2016increasing]; linear exact-size samplers for Catalan and Motzkin trees were obtained, exploiting the shape of their holonomic specifications [bacher2013exact].

What was left open since the initial work of Duchon et al., was the development of (i) efficient Boltzmann oracles providing effective means of evaluating combinatorial systems within their disks of convergence and (ii) an automated tuning procedure controlling the expected sizes of parameter values of generated structures. The former problem was finally addressed by Pivoteau, Salvy and Soria [PiSaSo12] who defined a rapidly converging combinatorial variant of the Newton oracle by lifting the combinatorial version of Newton’s iteration of Bergeron, Labelle and Leroux [species] to a new numerical level. In principle, using their Newton iteration and an appropriate use of binary search, it became possible to approximate the singularity of a given algebraic combinatorial system with arbitrarily high precision. However, even if the singularity

is estimated with precision

its approximation quality does not correspond to an equally accurate approximation of the generating function values at , often not better than . Precise evaluation at close to requires an extremely accurate precision of . Fortunately, it is possible to trade-off the evaluation precision for an additional rejection phase using the idea of analytic samplers [BodLumRolin] retaining the uniformity even with rough evaluation estimates.

Nonetheless, frequently in practical applications including for instance semi-automated software testing techniques, additional control over the internal structure of generated objects is required, see [palka2012]. In [BodPonty]

Bodini and Ponty proposed a multidimensional Boltzmann sampler model, developing a tuning algorithm meant for the random generation of words from context-free languages with a given target letter frequency vector. However, their algorithm converges only in an

a priori unknown vicinity of the target tuning variable vector. In practice, it is therefore possible to control no more than a few tuning parameters at the same time.

In the present paper we propose a novel polynomial-time tuning algorithm based on convex optimisation techniques, overcoming the previous convergence difficulties. We demonstrate the effectiveness of our approach with several examples of rational, algebraic and Pólya structures. Remarkably, with our new method, we are easily able to handle large combinatorial systems with thousands of combinatorial classes and tuning parameters.

In order to illustrate the effectiveness of our approach, we have implemented a prototype sampler generator Boltzmann Brain (bb in short). The source code is available at Github. Supplementary scripts used to generate and visualise the presented applications of this paper are available as a separate repository.

In § 2 we briefly recall the principles of Boltzmann sampling. Next, in § 3 we describe the tuning procedure. In § 4 we propose four exemplary applications and explain the interface of bb. Finally, in the appendix we give the proofs of the theorems, discuss implementation details and describe a novel exact-size sampling algorithm for strongly connected rational grammars.

## 2. Sampling from Boltzmann principles

### 2.1. Specifiable k-parametric combinatorial classes.

Let us consider the neutral class and its atomic counterpart , both equipped with a finite set of admissible operators (i.e. disjoint union , Cartesian product , sequence , multiset and cycle ), see [flajolet09, 24–30]. Combinatorial specifications are finite systems of equations (possibly recursive) built from elementary classes , and the admissible operators.

###### Example .

Consider the following joint specification for and . In the combinatorial class

of trees, nodes of even level (the root starts at level one) have either no or two children and each node at odd level has an arbitrary number of non-planarily ordered children:

 (1) {T=Z\sc MSet(Q) ,Q=Z+ZT2.

In order to distinguish (in other words mark) some additional combinatorial parameters we consider the following natural multivariate extension of specifiable classes.

###### Definition .

(Specifiable -parametric combinatorial classes) A specifiable -parametric combinatorial class is a combinatorial specification built, in a possibly recursive manner, from distinct atomic classes (), the neutral class and admissible operators and . In particular, a vector forms a specifiable -parametric combinatorial class if its specification can be written down as

 (2) ⎧⎪⎨⎪⎩C1=Φ1(C,Z1,…,Zk),⋮Cm=Φm(C,Z1,…,Zk)

where the right-hand side expressions are composed from , admissible operators and the neutral class . Moreover, we assume that specifiable -parametric combinatorial specifications form well-founded aperiodic systems, see [species, PiSaSo12, drmota1997systems].

###### Example .

Let us continue our running example, see (1). Note that we can introduce two additional marking classes and into the system, of weight zero each, turning it in effect to a -specifiable combinatorial class as follows:

 (3) {T=UZ\sc MSet(Q),Q=VZ+ZT2.

In this example, is meant to mark the occurrences of nodes at odd levels, whereas is meant to mark leaves at even levels. In effect, we decorate the univariate specification with explicit information regarding the internal structural patterns of our interest.

Much like in their univariate variants, -parametric combinatorial specifications are naturally linked to ordinary multivariate generating functions, see e.g [flajolet09].

###### Definition .

(Multivariate generating functions) The multivariate ordinary generating function in variables associated to a specifiable -parametric combinatorial class is defined as

 (4) C(z1,…,zk)=∑p1≥0,…,pk≥0cpzp

where denotes the number of structures with atoms of type and denotes the product . In the sequel, we call the (composition) size of the structure.

In this setting, we can easily lift the usual univariate generating function building rules to the realm of multivariate generating functions associated to specifiable -parametric combinatorial classes. table 1 summarises these rules.

### 2.2. Multiparametric Boltzmann samplers.

Consider a typical multiparametric Boltzmann sampler workflow [BodPonty] on our running example, see (3). We start with choosing target expectation quantities of nodes from atomic classes . Next, using a dedicated tuning procedure we obtain a vector of three real positive numbers depending on . Then, we construct a set of recursive Boltzmann samplers , etc. according to the building rules in table 1. Finally, we use the so constructed samplers to generate structures with tuned parameters.

In order to sample from either or atomic classes, we simply construct the neutral element or an appropriate atomic structure , respectively. For union classes we make a Bernoulli choice depending on the quotients of respective generating functions values and continue with sampling from the resulting class. In the case of product classes, we spawn two independent samplers, one for each class, and return a pair of built structures. Finally, for

we draw a random value from a geometric distribution with parameter

and spawn that many samplers corresponding to the class . In other words, . In the end, we collect the sampler outcomes and return their list. The more involved and constructions are detailed in Appendix C.

The probability space associated to so constructed Boltzmann samplers takes then the following form. Let be a vector inside the ball of convergence of and be a structure of composition size in a -parametric class . Then, the probability that becomes the output of a multiparametric Boltzmann sampler is given as

 (5) Pz(ω)=zpC(z).
###### Proposition .

Let be the random vector where equals the number of atoms of type in a random combinatorial structure returned by the -parametric Boltzmann sampler . Then, the expectation vector and the covariance matrix are given by

 Ez(Ni)=∂∂ξilogC(eξ)∣∣∣ξ=logz% andCovz(N)=[∂2∂ξi∂ξjlogC(eξ)]ki,j=1∣∣ ∣∣ξ=logz.

Hereafter, we use to denote coordinatewise exponentiation.

###### Corollary .

The function is convex because its matrix of second derivatives, as a covariance matrix, is positive semi-definite inside the set of convergence. This crucial assertion will later prove central to the design of our tuning algorithm.

###### Remark .

Uniparametric recursive samplers of Nijenhuis and Wilf take, as well as Boltzmann samplers, a system of generating functions as their input. This system can be modified by putting fixed values of tuning variables, in effect altering the corresponding branching probabilities. The resulting distribution of the random variable corresponding to a weighted recursive sampler coincides with the distribution of the Boltzmann-generated variable conditioned on the structure size. As a corollary, the tuning procedure that we discuss in the following section is also valid for the exact-size approximate-frequency recursive sampling. In

Appendix B we describe an algorithm for rational specifications which samples objects of size . As a by-product, we show how to convert approximate-size samplers corresponding to rational systems into exact-size samplers.

## 3. Tuning as a convex optimisation problem

We start with a general result about converting the problem of tuning arbitrary specifiable -parametric combinatorial specifications into a convex optimisation problem, provided that one has access to an oracle yielding values and derivatives of corresponding generating functions. We note that this general technique can be applied to differential specifications as well. We write , to denote the minimisation (maximisation, respectively) problem of the target function with respect to the vector variable . All proofs are postponed until Appendix A. Throughout this section, we assume that given tuning expectations are admissible in the sense that there always exists a target vector corresponding to (2.2). Furthermore, we assume that the combinatorial system is well-founded and strongly connected. Some non-strongly connected cases fall into the scope of our framework as well, but for the core proof ideas we concentrate only on strongly connected systems.

###### Theorem .

Consider a multiparametric combinatorial class . Fix the expectations , see Proposition 2.2. Let be the generating function corresponding to . Then, the tuning vector , see (2.2), is equal to where comes from the following minimisation problem:

 (6) logC(eξ)−ν⊤ξ→minξ.

Let us turn to the specific classes of algebraic and rational specification. In those cases, no differential-equation type systems are allowed; however, it is possible to reformulate the problem so that no extra oracles are required.

###### Theorem .

Let be a multiparametric algebraic system with . Fix the expectations of the parameters of objects sampled from to . Then, the tuning vector is equal to where comes from the convex problem:

 (7) {c1−ν⊤ξ→minξ,c,logΦ(ec,eξ)−c≤0.

Hereafter, “” and denote a set of inequalities and the coordinatewise logarithm, respectively.

Let us note that the above theorem naturally extends to the case of labelled structures with and operators. For unlabelled Pólya operators like or , we have to truncate the specification to bound the number of substitutions. In consequence, it becomes possible to sample corresponding unlabelled structures, including partitions, functional graphs, series-parallel circuits, etc.

Singular Boltzmann samplers (also defined in [DuFlLoSc]) are the limiting variant of ordinary Boltzmann samplers with an infinite expected size of generated structures. In their multivariate version, samplers are considered singular if their corresponding variable vectors belong to the boundary of the respective convergence sets.

###### Theorem .

Let be a multiparametric algebraic system with , the atomic class marking the corresponding structure size and being a vector (possibly empty) of distinguished atoms. Assume that the target expected frequencies of the atoms are given by the vector . Then, the variables that deliver the tuning of the corresponding singular Boltzmann sampler are the result of the following convex optimisation problem, where , :

 (8) {ξ+α⊤η→maxξ,η,c,logΦ(ec,eξ,eη)−c≤0.

Finally, let us note that all of the above outlined convex programs can be effectively optimised using the polynomial-time interior-point method optimisation procedure of Nesterov and Nemirovskii [nesterov1994interior]. The required precision is typically , see Appendix A.

###### Theorem .

For multiparametric combinatorial systems with description length , the tuning problem can be solved with precision in time .

Let us complete this section by constructing an optimisation system for (2.1). Let be the target expectation quantities of . By the rules in table 1, the system of functional equations and its log-exp transformed optimisation counterpart take the form

 (9) ⎧⎪ ⎪⎨⎪ ⎪⎩T(z,u,v)=uzexp(∞∑i=1Q(zi,ui,vi)i),Q(z,u,v)=vz+zT(z,u,v)2.

Setting , , , , , we obtain

 (10) ⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩τ1−nζ−kη−mϕ→min,τj≥ηj+ζj+∞∑i=1eκiji,j∈{1,2,…}κj≥log(eϕj+ζj+eζj+2τj),j∈{1,2,…}.

For practical purposes, the sum can be truncated with little effect on distribution.

## 4. Applications

In this section we present several examples illustrating the wide range of applications of our tuning techniques. Afterwards, we briefly discuss our prototype sampler generator and its implementation details.

### 4.1. Polyomino tilings.

We start with a benchmark example of a rational specification defining rectangular tilings using up to different tile variants (a toy example of so-called transfer matrix models, cf. [flajolet09, Chapter V.6, Transfer matrix models]).

We begin the construction with defining the set of admissible tiles. Each tile consists of two horizontal layers. The base layer is a single connected block of width . The second layer, placed on top of the base one, is a subset (possibly empty) of blocks, see figure 1. For presentation purposes each tile is given a unique, distinguishable colour.

Next, we construct the asserted rational specification following the general construction method of defining a deterministic automaton with one state per each possible partial tiling configuration using the set of available tiles. Tracking the evolution of attainable configurations while new tiles arrive, we connect relevant configurations by suitable transition rules in the automaton. Finally, we (partially) minimise the constructed automaton removing states unreachable from the initial empty configuration. Once the automaton is created, we tune the tiling sampler such that the target colour frequencies are uniform, i.e. each colour occupies, on average, approximately of the outcome tiling area. figure 2 depicts an exemplary tiling generated by our sampler.

The automaton corresponding to our tiling sampler consists of more than states and transitions. We remark that this example is a notable improvement over the work of Bodini and Ponty [BodPonty] who were able to sample tilings using different tiles (we handle ) with a corresponding automaton consisting of roughly states and transitions.

### 4.2. Simply-generated trees with node degree constraints.

Next, we give an example of simple varieties of plane trees with fixed sets of admissible node degrees, satisfying the general equation

 y(z)=zϕ(y(z))for some polynomialϕ:C→C.

Let us consider the case of plane trees where nodes have degrees in the set , i.e. . Here, the numbers are nonnegative real coefficients. We tune the corresponding algebraic specification so to achieve a target frequency of for all nodes of degrees . Frequencies of nodes with degrees are left undistorted. For presentation purposes all nodes with equal degree are given the same unique, distinguishable colour. figure 3 depicts two exemplary trees generated in this manner.

Empirical frequencies for the right tree of figure 3 and a simply-generated tree of size in between and with default node degree frequencies are included in table 2.

We briefly remark that for this particular problem, Bodini, David and Marchal proposed a different, bit-optimal sampling procedure for random trees with given partition of node degrees [DBLP:conf/caldam/BodiniJM16].

### 4.3. Variable distribution in plain λ-terms.

To exhibit the benefits of distorting the intrinsic distribution of various structural patterns in algebraic data types, we present an example specification defining so-called plain -terms with explicit control over the distribution of de Bruijn indices.

In their nameless representation due to de Bruijn [deBruijn1972] -terms are defined by the formal grammar where is an infinite denumerable set of so-called indices (cf. [BendkowskiGLZ16, GittenbergerGolebiewskiG16]). Assuming that we encode de Bruijn indices as a sequence of successors of zero (i.e. use a unary base representation), the class of plain -terms can be specified as where . In order to control the distribution of de Bruijn indices we need a more explicit specification for de Bruijn indices. For instance:

 D=U0Z+U1Z2+⋯+UkZk+1+Zk+2\sc Seq% (Z).

Here, we roll out the initial indices and assign distinct marking variables to each one of them, leaving the remainder sequence intact. In doing so, we are in a position to construct a sampler tuned to enforce a uniform distribution of among all marked indices, i.e. indices , distorting in effect their intrinsic geometric distribution.

figure 4 illustrates two random -terms with such a new distribution of indices. For presentation purposes, each index in the left picture is given a distinct colour.

Empirical frequencies for the right term of figure 4 and a plain -term of size in between and with default de Bruijn index frequencies are included in table 3.

Let us note that algebraic data types, an essential conceptual ingredient of various functional programming languages such as Haskell or OCaml, and the random generation of their inhabitants satisfying additional structural or semantic properties is one of the central problems present in the field of property-based software testing (see, e.g. [Claessen-2000, palka2012]). In such an approach to software quality assurance, programmer-declared function invariants (so-called properties) are checked using random inputs, generated accordingly to some predetermined, though usually not rigorously controlled, distribution. In this context, our techniques provide a novel and effective approach to generating random algebraic data types with fixed average frequencies of type constructors. In particular, using our methods it is possible to boost the intrinsic frequencies of certain desired subpatterns or diminish those which are unwanted.

### 4.4. Weighted partitions.

Integer partitions are one of the most intensively studied objects in number theory, algebraic combinatorics and statistical physics. Hardy and Ramanujan obtained the famous asymptotics which has later been refined by Rademacher [flajolet09, Chapter VIII]. In his article [Vershik1996], Vershik considers several combinatorial examples related to statistical mechanics and obtains the limit shape for a random integer partition of size with parts and summands bounded by . Let us remark that Bernstein, Fahrbach, and Randall [bernstein2017analyzing] have recently analysed the complexity of exact-size Boltzmann sampler for weighted partitions. In the model of ideal gas, there are several particles (bosons) which form a so-called assembly of particles. The overall energy of the system is the sum of the energies where denotes the energy of -th particle. We assume that energies are positive integers. Depending on the energy level there are possible available states for each particle; the function depends on the physical model. Since all the particles are indistinguishable, the generating function for the number of assemblies with energy takes the form

 (11) P(z)=∞∑Λ=0p(Λ)zΛ=∏λ>01(1−zλ)j(λ).

In the model of -dimensional harmonic trap (also known as the Bose-Einstein condensation) according to [chase1999canonical, haugset1997bose, lucietti2008asymptotic] the number of states for a particle with energy is so that each state can be represented as a multiset with elements having different colours. Accordingly, an assembly is a multiset of particles (since they are bosons and hence indistinguishable) therefore the generating function for the number of assemblies takes the form

 (12) P(z)=\sc MSet(\sc MSet≥1(Z1+⋯+Zd)).

It is possible to control the expected frequencies of colours using our tuning procedure and sample resulting assemblies as Young tableaux. Each row corresponds to a particle whereas the colouring of the row displays the multiset of included colours, see figure 5. We also generated weighted partitions of expected size (which are too large to display) with tuned frequencies of colours, see table 4.

Let us briefly explain our generation procedure. Boltzmann sampling for the outer operator is described in 2Appendix C. The sampling of inner is more delicate. The generating function for this multiset can be written as

 (13) \sc MSet≥1(z1+⋯+zd)=d∏i=111−zi−1.

In order to correctly calculate the branching probabilities, we introduce slack variables satisfying . Boltzmann samplers for the newly determined combinatorial classes are essentially Boltzmann samplers for . Let us note that after expanding brackets the expression becomes

 \sc MSet≥1(z1+⋯+zd)=(s1+⋯+sd)+(s1s2+⋯+sd−1sd)+⋯+s1s2…sd.

The total number of summands is where each summand corresponds to choosing some subset of colours. Finally, let us explain how to precompute all the symmetric polynomials and efficiently handle the branching process in quadratic time using a dynamic programming approach. We can recursively define two arrays of real numbers and satisfying

 (14) ⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩p1,j=sj,j∈{1,…,d};qk,d=pk,d,k∈{1,…,d};qk,j=pk,j+qk,j+1,j∈{k,…,d−1},k∈{1,…,d};pk,j=sj−k+1⋅qk−1,j,j∈{k,…,d−1},k∈{2,…d};

Arrays contain the branching probabilities determining the next colour inside the -th symmetric polynomial. Arrays contain partial sums for the -th symmetric polynomial and are required in intermediate steps. Numbers are equal to the total values of symmetric polynomials and they define initial branching probabilities to choose the number of colours.

### 4.5. Prototype sampler generator.

Consider the following example of an input file for Boltzmann Brain:

-- Motzkin trees
Motzkin = Leaf (3)
| Unary Motzkin
| Binary Motzkin Motzkin (2) [0.3].

Here, a Motzkin algebraic data type is defined. It consists of three constructors: a constant Leaf of weight three, a Unary constructor of weight one (default value if not explicitly annotated) and a constructor Binary of weight two together with an explicit tuning frequency of . Such a definition corresponds to the combinatorial specification where the objective is to obtain the mean proportion of equal of the total structure size. All the terms Leaf, Unary, Motzkin, Binary are user-defined keywords. Given such a specification on input, bb builds a corresponding singular Boltzmann sampler implemented in form of a self-contained Haskell module.

## Appendix A Convex optimisation: proofs and algorithms

Until now, we have left several important questions unanswered. Firstly, what is the required precision for multiparametric tuning? Secondly, what is its precise computational complexity? In order to determine the time and space complexity of our tuning procedure we need to explain some technical decisions regarding the choice of particular optimisation methods. In this section we prove that the optimisation procedures described in § 3 give the correct solution to the tuning problem.

### a.1. Proofs of the theorems.

Proof of Theorem 3. Let the following nabla-notation denote the vector of derivatives (so-called gradient vector) with respect to the variable vector :

 (15) ∇zf(z)=(∂∂z1f(z),…,∂∂zkf(z))⊤.

We start with noticing that tuning the expected number of atom occurrences is equivalent to solving the equation , see Proposition 2.2. Here, the right-hand side is equal to so tuning is further equivalent to . The function under the gradient is convex as it is a sum of a convex and linear function. In consequence, the problem of minimising the function is equivalent to finding the root of the derivative

 (16) logC(eξ)−ν⊤ξ→minξ.
###### Definition .

(Feasible points) In the optimisation problem

 (17) {f(z)→min,z∈Ω

a point is called feasible if it belongs to the set .

Proof of Theorem 3. Let be the vector of atom occurrences of each type. Consider the vector such that Let denote the logarithms of the values of generating functions at point . Clearly, in such a case all inequalities in (7) become equalities and the point is feasible.

Let us show that if the point is optimal, then all the inequalities in (7) become equalities. Firstly, suppose that the inequality

 (18) c1≥logΦ1(ec,eξ)

does not turn to an equality. Certainly, there is a gap and the value can be decreased. In doing so, the target function value is decreased as well. Hence, the point cannot be optimal.

Now, suppose that the initial inequality does turn to equality, however for some . Since the system is strongly connected, there exists a path (indices are chosen without loss of generality) in the corresponding dependency graph. Note that for pairs of consecutive variables in , the function is strictly monotonic in (as its monotonic and references ). In such a case we can decrease so to assure that while the point remains feasible. Decreasing in order, we finally arrive at a feasible point with a decreased target function value. In consequence, could not have been optimal to begin with.

So, eventually, the optimisation problem reduces to minimising the expression subject to the system of equations or, equivalently, and can be therefore further reduced to Theorem 3.

Proof of Theorem 3. By similar reasoning as in the previous proof, we can show that the maximum is attained when all the inequalities turn to equalities. Indeed, suppose that at least one inequality is strict, say . The value can be slightly decreased by by choosing a sufficiently small distortion to turn all the equalities containing in the right-hand side to strict inequalities, because the right-hand sides of each of the inequalities are monotonic functions with respect to . This procedure can be repeated until all the equalities turn into inequalities. Finally, we slightly decrease the value to increase the target function while still staying inside the feasible set, because of the monotonicity of the right-hand side with respect to .

Let us fix . For rational and algebraic grammars, within the Drmota–Lalley–Woods framework, see for instance [drmota1997systems], the corresponding generating function singular approximation takes the form

 (19) C(z,u)∼a0(u)−b0(u)(1−zρ(u))t.

If , then the asymptotically dominant term becomes . In this case, tuning the target expected frequencies corresponds to solving the following equation as :

 (20) diag(u)[zn]∇uC(z,u)[zn]C(z,u)=nα.

Let us substitute the asymptotic expansion (19) into (20) to track how depends on :

 (21) diag(u)[zn]tb0(u)(1−zρ(u))t−1z∇uρ(u)ρ2(u)[zn]b0(u)(1−zρ(u))t=−nα.

Only dominant terms are accounted for. Then, by the binomial theorem

 (22) diag(u)b0(u)tn(t−1n)z∇uρ(u)ρ2(u)b0(u)−1(tn)−1=−α,

With , as , we obtain after cancellations

 (23) diag(u)∇uρ(u)ρ(u)=−α

which can be rewritten as

 (24) ∇ηlogρ(eη)=−α.

Passing to exponential variables (24) becomes

 (25) ∇η(ξ(η)+α⊤η)=0.

As we already discovered, the dependence is given by the system of equations because the maximum is achieved only when all inequalities turn to equations. That is, tuning the singular sampler is equivalent to maximising over the set of feasible points.

###### Remark .

For ordinary and singular samplers, the corresponding feasible set remains the same; what differs is the optimised target function. Singular samplers correspond to imposing an infinite target size. In practice, however, the required singularity is almost never known exactly but rather calculated up to some feasible finite precision. The tuned structure size is therefore enormously large, but still, nevertheless, finite. In this context, singular samplers provide a natural limiting understanding of the tuning phenomenon and as such, there are several possible ways of proving Theorem 3.

figure 6 illustrates the feasible set for the class of binary trees and its transition after applying the log-exp transform, turning the set into a convex collection of feasible points. In both figures, the singular point is the rightmost point on the plot. Ordinary sampler tuning corresponds to finding the tangent line which touches the set, given the angle between the line and the abscissa axis.

### a.2. Disciplined convex programming and optimisation algorithms.

In the subsequent proofs, we present the framework of Disciplined Convex Programming (DCP in short) and show how to incorporate elementary combinatorial constructions into this framework. Nesterov and Nemirovskii [nesterov1994interior] developed a seminal polynomial-time optimisation algorithm for convex programming which involves the construction of certain self-concordant barriers related to the feasible set of points. The arithmetic complexity of their method is

 (26) O(log1ε√ϑN)

where is the arithmetic complexity of a single Newton iteration step, is the so-called constant of self-concordeness of the barriers and is the target precision. Before we go into each of the terms, we mention that for sparse matrix representations, it is possible to accelerate the speed of the Newton iteration, i.e. the step of solving the system of linear equations

 (27) Ax=bwhereA∈Rm×m and x,b∈Rm

from to .

Unfortunately, for general convex programming problems there is no constructive general-purpose barrier construction method, merely existence proofs. Fortunately, Grant, Boyd, and Ye [grant2006disciplined] developed the DCP framework which automatically constructs suitable barriers for the user. Moreover, DCP also automatically provides the starting feasible point which is itself a nontrivial problem in general. As its price, the user is obliged to provide a certificate that the constructed problem is convex, i.e. express all convex functions in terms of a predefined set of elementary convex functions.

In our implementation, we rely on two particular solvers, a second-order (i.e. using second-order derivatives) Embedded Conic Solver (ECOS[domahidi2013ecos] and recently developed first-order (i.e. using only first-order derivatives) Splitting Conic Solver (SCS) algorithm [o2016conic]. The conversion of the DCP problem into its standard form is done using cvxpy, a Python-embedded modelling language for disciplined convex programming [cvxpy].

Proof of Theorem 3. We start with showing that the tuning procedure can be effectively represented in the framework of DCP.

In our case, every inequality takes the form

 (28) ci≥log(m∑i=1eℓi(c,z))

where are some linear functions. Appreciably, the log-sum-exp function belongs to the set of admissible constructions of the DCP framework.

Converting the tuning problem into DCP involves creating some slack variables. For each product of two terms we create slack variables for and which are represented by the variables and in the log-exp realm as

 (29) eξ=Xandeη=Y.

Next, we replace by as composition of addition and exponentiation is a valid DCP program. Since every expression in systems corresponding to considered combinatorial classes is a sum of products, the corresponding restriction (28) is converted to a valid DCP constraint using the elementary log-sum-exp function.

The sequence operator which converts a generating function into is unfolded by adding an extra equation into the system in form of

Two additional constructions, and are treated in a similar way. Infinite sums are replaced by finite ones because the difference in the distribution of truncated variables is a negative exponent in the truncation length, and hence negligible.

Using the DCP method, the constant of self-concordness of the barriers is equal to , where is the length of the problem description. This includes the number of combinatorial classes, number of atoms for which we control the frequency and the sum of lengths of descriptions of each specification, i.e. their overall length. In total, the complexity of optimisation can be therefore crudely estimated as

 (31) O(L3.5log1ε).

Certainly, the complexity of tuning is polynomial, as stated. We emphasise that in practice, using sparse matrices this can be further reduced to .

###### Remark .

Weighted partitions, one of our previous applications, involves a multiset operator which generalises to and does not immediately fall into the category of admissible operators as it involves subtraction. This is a general weak point of Boltzmann sampling involving usually a huge amount of rejections, in consequence substantially slowing down the generation process. Moreover, it also disables our convex optimisation tuning procedure because the constructions involving the minus sign cease to be convex and therefore do not fit the DCP framework.

We present the following change of variables for this operator, involving a quadratic number of slack variables. The operator yielding the generating function is replaced by where satisfies

 (32) Si=Ci+SiCi.

Next, we expand all of the brackets in the product . Consequently, we define the following arrays and :

 (33) ⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩P1,j=Cj,j∈{1,…,d}Qk,d=Pk,d,k∈{1,…,d}Qk,j=Pk,j+Qk,j+1,j∈{k,…,d−1}, k∈{1,…,d}Pk,j=Cj−k+1⋅Qk−1,j,j∈{k,…,d−1}, k∈{2,…,d}.

Semantically, as in § 4.4, and denote the summands inside symmetric polynomials

 (34)

and the auxiliary partial sums used to recompute the consequent expressions, respectively. So for instance when we obtain

 (35) P=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝S1S2S3S4S50S1(S2+…+S5)…S3(S4+S5)S4S500S1(…)S2(S3(S4+S5)+S4S5)S3S4S5000S1(…)S2…S50000S1…S5⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠.

The union of classes in each row gives corresponding symmetric polynomial , and the partial sum of elements in the row gives the elements of .

Finally, the expression is replaced by the sum of elementary symmetric polynomials where we have (combinatorially)

 (36) Qj,j=∑1≤i1<…

We emphasise that the last sum is not meant to be implemented in practice in a naïve way as it would take an exponential amount of time to be computed.

### a.3. Tuning precision.

In this section, we only work with algebraic systems that meet the certain regularity conditions from Drmota–Lalley–Woods Theorem [drmota1997systems].

###### Proposition .

Consider a multiparametric combinatorial specification

 (37) Y=Φ(Y,Z,U)

whose corresponding system of equations is either rational or algebraic. Suppose that we sample objects from the class with target expected sizes , where are constants, . Let be the multivariate generating function corresponding to the class , and let be the target tuning vector. Then, there exists such that the points from the -ball centered at intersected with the set of feasible points

 {(z,u)∈R1+d∣Y(z,u)≥Φ(Y(z,u),z,u), ∥(z∗−z,u∗−u)∥≤ε}

yield expectations within of target expectations:

 zF′z(z,u)/F(z,u) = n+O(1), uiF′ui(z,u)/F(z,u) = νin+O(1), i∈{1,…,d}.
###### Proof.

Let us show that , as a function of , satisfies

 (38) {z∗(n)∼ρ(1−α/n),F is rational;z∗(n)∼ρ(1−C/n2),F is algebraic.

Here, is a positive integer depending on the rational system, is a generic constant. We also note that the same asymptotics is valid for each coordinate of the vector , up to multiplicative constants depending on and the values of and .

For rational systems, there exist analytic functions , and a positive integer such that

 (39) F(z,u)∼β(z,u)(1−z/ρ(u))−α,z→ρ(u).

After substituting the asymptotic expansion (39) into (2.2), we obtain the first part of (38).

For algebraic systems, according to Drmota–Lalley–Woods Theorem [drmota1997systems], there exist analytic functions such that as ,

 (40) F(z,u)∼α(z,u)−β(z,u)(1−z/ρ(u))1/2.

Again, substituting this asymptotic expansion into (2.2), we obtain

 (41) z∗(n)β2ρα(1−z∗(n)ρ)−1/2∼n.

Taking into account that , this implies the second part of (38). Similarly, this can be applied to each coordinate of , not only to .

Let us handle the tuning precision. We use the mean value theorem to bound . Let . Then,

 ε2≥∥(z∗(n)−z∗(m),u∗(n)−u∗(m))∥2=(z∗(n)−z∗(m)2+d∑i=1(u∗i(n)−u∗i(m))2.

By the mean value theorem, there exist numbers from the interval such that

 z∗(n)−z∗(m) = (n−m)dz∗dn(n′0) , u∗i(n)−u∗i(m) = (n−m)du∗idn(n′i), i∈{1,…,d} .

Thus, as , we obtain

 ε2≥O(1)⎡⎣(dz∗d