Directed Acyclic Graphs (DAGs) appear naturally in the study of compacted trees, automaton recognizing finite languages, and partial orders. Until now, the asymptotic number of DAGs has been known only in the dense case, i.e. for DAGs with vertices and edges. In this paper, we give a solution to the sparse case with
, which curiously involves a phase transition in the regioncorresponding to the phase transition of directed graphs discovered in [luczak2009critical].
Exact and asymptotic enumeration.
In 1973, Robinson [robinson1973counting] obtained his beautiful formula for the number of labeled DAGs with vertices and edges
and developed a framework for the enumeration of digraphs whose strong components belong to a given family of allowed strongly connected digraphs. This allowed to express the asymptotics of dense DAGs in [bender1986asymptotic]. The structure of random DAGs has been studied in [liskovets1976number, mckay1989shape, gessel1996counting].
We say that a digraph is elementary if all its strong components are either isolated vertices or cycles. In [luczak1990phase] and [luczak2009critical] it was shown that if the ratio between the numbers of edges and vertices is less than one, then a digraph is elementary asymptotically almost surely. More precisely, this happens when a digraph has vertices and edges, as with . Other interesting results on the structure of random -digraphs around the point of phase transition are available in [pittel2017birth, goldschmidt2019scaling]. More precisely, the authors of [goldschmidt2019scaling] show that the strong components are asymptotically almost surely cubic, i.e. the sum of the degrees of each of its nodes is at most three with high probability. This means that these cores play an analogous role as the classical cores in a random graphs, see [janson1993birth].
A forthcoming independent approach of [sparserandomacyclicdigraphs] in the analysis of asymptotics of DAGs (manuscript to appear), is similar in spirit to the tools used in [flajolet2004airy] and relies on a bivariate singularity analysis of the generating function of DAGs. Their technique promises to unveil sparse DAGs asymptotics, covering as well the case where the ratio of the numbers of edges and vertices is bounded, but greater than (the supercritical case).
Typically, the analysis of graphs is technically easier when loops and multiple edges are allowed, [janson1993birth]. Essentially, an adaptation of the symbolic techniques to the case of simple graphs becomes rather a technical, but not a conceptual difficulty. A systematic way to account for special cases arising for simple graphs is given in [panafieu2016analytic] and [collet2018threshold], see the concept of patchworks. The same principle concerns directed graphs. Nevertheless, in the current paper we consider the case of simple digraphs where loops and multiple edges are forbidden. In our model, however, the cycles of size 2 are allowed, because it is natural to suppose that for each two vertices and both directions are allowed. The analysis of simple digraphs is technically heavier than the analysis of multidigraphs, but we prefer to demonstrate explicitly that such an application is indeed possible.
Firstly, we transform the generating function of DAGs so that it can be decomposed into an infinite sum. Each of its summands is analysed using a new bivariate semi-large powers lemma which is a generalisation of [banderier2001random]. We discover (in the above notations) that the first term of this infinite expansion is dominating in the subcritical case, i.e. when ; in the case when is bounded (the critical case), all the terms give contributions of the same order. Next, using the symbolic tools for directed graphs from [de2019symbolic], we express the generating function of elementary digraphs and apply similar tools to obtain explicitly the phase transition curve in digraphs, that is, the probability that a digraph is elementary, as a function of .
Analytic techniques, largely covered in [flajolet2009analytic]
, are efficient for asymptotic analysis, because the coefficient extraction operation is naturally expressed through Cauchy formula. A recent study[greenwood2018asymptotics] is dealing with bivariate algebraic functions. In their case, a combination of two Hankel contours, necessary for careful analysis, can have a complicated mutual configuration in two-dimensional complex space, so a lot of details needs to be accounted for. Our approach is close to theirs, while we try to avoid the mentioned difficulty in our study. The principle idea behind our bivariate semi-large powers lemma is splitting of a double complex integral into a product of two univariate ones.
Structure of the paper.
2 Exact expressions using generating functions
Consider the following model of graphs and directed graphs. A graph is characterized by its set of labeled vertices and its set of unoriented unlabeled edges. Loops and multiple edges are forbidden. The numbers of its vertices and edges are denoted by and . An -graph is a graph with vertices and edges.
We consider digraph without loops, such that from any vertex to any vertex there can be at most one directed edge. Therefore, two edges can link the same pair of vertices only if their orientations are different.
2.1 Exponential and graphic generating functions
Two helpful tools in the study of graphs and directed graphs are the exponential and graphic generating functions. The exponential generating function (EGF) and the graphic generating function (GGF) associated to a graph or digraph family are defined as
The total numbers of -graphs and -digraphs are and . The classical counting expression for directed acyclic graphs is attributed to Robinson [robinson1973counting]. The EGF of all graphs and GGF of directed acyclic graphs are given by
We can reuse the EGF of graphs (1) to obtain an alternative expression for the number of -DAGs :
Before considering various digraph families, we need to recall the classical generating functions of simple graph families, namely the rooted and unrooted labeled trees and unicycles. A unicycle is a connected graph that has the same numbers of vertices and edges. Hence, it contains exactly one cycle.
Proposition 1 ([janson1993birth]).
The EGFs of rooted trees, of trees and of unicycles are characterized by the relations
The excess of a graph (not necessarily connected) is defined as the difference between its numbers of edges and vertices. For example, trees have excess , while unicycles have excess . The bivariate EGFs of graphs of excess can be obtained from their univariate EGFs by substituting and multiplying by . In particular, , , .
We say that a graph is complex if all its connected components have a positive excess. The EGF of complex graphs of excess is
It is known (see [janson1993birth]) that a complex graph of excess is reducible to a kernel (multigraph of minimal degree at least ) of same excess, by recursively removing vertices of degree and and fusioning edges sharing a degree vertex. The total weight of cubic kernels (all degrees equal to ) of excess is given by (3). They are central in the study of large critical graphs, because non-cubic kernels do not typically occur.
Proposition 2 ([janson1993birth, Section 6]).
For each there exists a polynomial such that
Clearly, any graph can be represented as a set of unrooted trees, unicycles and a complex component of excess . Therefore, the EGF of graphs is equal to
2.2 Exact expression for directed acyclic graphs
In order to obtain the asymptotic number of DAGs, we need a decomposition different from (1). For comparison, in the expression (4) the first summand is asymptotically dominating in the case of subcritical graphs. Inside the critical window, all the summands of (4) give a contribution of the same asymptotic order.
The number of -DAGs is equal to
The number of pairs of graphs, each on vertices, having a total of edges, is Working as in the previous proof leads to
which looks and behaves (when stays smaller than or close to ) like the expression for from the last lemma. This motivates the following intuition. Typically, those two graphs should share the edges more or less equally. Thus, when is close to , and should be close to , so and will exhibit critical graph structure. For a smaller ratio , and
will behave like subcritical graphs, containing only trees and unicycles. This heuristic explanation for the critical density for dags guides our analysis in the rest of the paper.
2.3 Exact expression for elementary digraphs
As we discovered in our previous paper [de2019symbolic], and which was also pointed in a different form in [robinson1973counting], the graphic generating function of the family of digraphs whose connected components belong to a given set with the EGF , is given by
and is the exponential Hadamard product, characterized by . is the GGF of sets of isolated vertices. In particular, for the case of elementary digraphs, i.e. the digraphs whose strong components are isolated vertices or cycles of length only, the EGF of is given by
In order to expand the Hadamard product, we develop the exponent and apply the simplification rule . After developing the exponent and expanding the Hadamard product we obtain a very simple expression, namely
The following lemma is a heavier version of this expression. One of the reasons behind its visual complexity is the choice of the simple digraphs instead of multidigraphs; however, during the asymptotic analysis, most of the decorations corresponding to simple digraphs are going to disappear.
The number of elementary digraphs is equal to
Let us denote . Using the already mentioned representation
Next, the change of variables yields
The proof is finished by extracting the coefficient . ∎
3 Asymptotic analysis
3.1 Bivariate semi-large powers lemma
The typical structure of critical random graphs can be obtained by application of the semi-large powers Theorem [flajolet2009analytic, Theorem IX.16, Case (ii)]. Since DAGs behave like a superposition of two graphs (see remark 4), we design a bivariate variant of this theorem.
Consider two integers and going to infinity, such that with either staying in a bounded real interval, or while ; let the function be analytic on the open torus of radii and continuous on its closure, and let and be two real values, then the following asymptotics holds as
where the function is defined as
Proof of lemma 6.
The first step is to represent the coefficient extraction operation from (7) as a double complex integral, using Cauchy formula, and to approximate this double integral with a product of two complex integrals. We start with the Puiseux expansion of the EGF of rooted labeled trees and unrooted labeled trees :
Applying Cauchy’s integral theorem, we rewrite the coefficient extraction (7) in the form
A further step is to inject , , and , where . By using expansion (8) in order to approximate the terms and , we rewrite the answer in the form
After removal of the negligible terms, a product of integrals is obtained
Each of the integrals can be evaluated similarly as in [flajolet2009analytic, Theorem IX.16, Case (ii)]: in order to evaluate such integral, a variable change is applied, and the integral is expressed as an infinite sum using a Hankel contour formula for the Gamma function:
3.2 Asymptotic analysis of directed acyclic graphs
where is given by proposition 2. This notation will be used throughout the next two sections.
When and either stays in a bounded real interval, or while as ,
In particular, for the sparse case ,
In order to apply lemma 6 (bivariate semi-large powers), we develop the coefficient operator in lemma 3 using the approximation of from proposition 2 and drop the terms that give negligible contribution:
Then we apply lemma 6 and the approximation to obtain
The power of in the sum is , and the sum over of is equal to and converges to . Finally, the sums over and are decoupled and we obtain
The sum over admits a close expression (see remark 7 and [janson1993birth, Section 14]). Applying Stirling’s formula, we can rescale the asymptotic number of DAGs by the total number of digraphs:
This gives the main statement. To obtain the sparse case, we need to use the fact that when , the first summand of the sum over is dominating, and therefore, this sum is asymptotically equivalent to (see [janson1993birth, Equation (10.3)]). ∎
3.3 Asymptotic analysis of elementary digraphs
When and either stays in a bounded real interval, or while as ,
where the coefficients are given by
In particular, when , ,
The key ingredient is the exact expression from lemma 5. As in the proof of theorem 8, we can drop the terms that give negligible contributions and develop the coefficient operator accordingly. The key difference between the proofs is the form of the denominator: after taking out a common multiple (ignoring higher powers in variable ), the denominator can be again regarded as a formal power series in . In order to obtain the asymptotics, the transformed expression should be developed, then lemma 6 (bivariate semi-large powers) is applied, and finally the sums are decoupled. For the sum corresponding to variable , we apply again the hypergeometric summation formula from [janson1993birth]. In order to settle the subcritical case , we apply the asymptotic approximation of from remark 7. ∎
Curiously enough, the coefficient in the subcritical probability can be given the same interpretation as a similar coefficient arising in the probability that a random graph does not contain a complex component: namely the compensation factor of the simplest cubic forbidden multigraph.
We are grateful to Olivier Bodini, Naina Ralaivaosaona, Vonjy Rasendrahasina, Vlady Ravelomanana, and Stephan Wagner for fruitful discussions.