Asymptotic enumeration of digraphs and bipartite graphs by degree sequence

06/29/2020 ∙ by Anita Liebenau, et al. ∙ UNSW Monash University 0

We provide asymptotic formulae for the numbers of bipartite graphs with given degree sequence, and of loopless digraphs with given in- and out-degree sequences, for a wide range of parameters. Our results cover medium range densities and close the gaps between the results known for the sparse and dense ranges. In the case of bipartite graphs, these results were proved by Greenhill, McKay and Wang in 2006 and by Canfield, Greenhill and McKay in 2008, respectively. Our method also essentially covers the sparse range, for which much less was known in the case of loopless digraphs. For the range of densities which our results cover, they imply that the degree sequence of a random bipartite graph with m edges is accurately modelled by a sequence of independent binomial random variables, conditional upon the sum of variables in each part being equal to m. A similar model also holds for loopless digraphs.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Enumeration of discrete structures with local constraints has attracted the interest of many researchers and has applications in various areas such as coding theory, statistics and neurostatistical analysis. Exact formulae are often hard to derive or infeasible to compute. Asymptotic formulae are therefore sought and often provide sufficient information for the aforementioned applications. In this paper we find such formulae for bipartite graphs with given degree sequence, or loopless digraphs with given in- and out-degree sequences. Our results imply that the degree sequence of a random digraph or bipartite graph with edges is close to a sequence of independent binomial random variables, conditional upon the sum of degrees in each part being equal to .

We frame all our arguments in terms of bipartite graphs: as noted below, digraphs are equivalent to “balanced” bipartite graphs. Thus, if loops are not forbidden, the digraph enumeration problem is the same as the bipartite one. The loopless case for digraphs is equivalent to bipartite graphs with a forbidden perfect matching. Our results on counting bipartite graphs with a given degree sequence imply equivalent results on counting - matrices with given row and column sums. Similarly, counting (loopless) digraphs is equivalent to counting square - matrices with given row and column sums where the entries on the diagonal are required to be 0.

Our results are obtained via the method of degree switchings and contraction mappings recently introduced by the authors in [11] to count the number of “nearly” regular graphs of a given degree sequence for medium-range densities, and a wider range of degree sequences for low densities. The basic structure of the argument is very similar in the present case, but it needs significant modifications to account for the fact that we are dealing with bipartite graphs and certain edges are not allowed.

1.1 Enumeration results

The formulae in [11] are stated in terms of a relationship between the degree sequence of the Erdős-Rényi random graph and a sequence of independent binomial random variables. We shall do the same here for appropriate bipartite random graphs and suitable independent binomials. We first introduce appropriate graph theoretic notation. Let be integers and let and . We use and as the two parts of the vertex set of a bipartite graph , i.e. a graph with bipartition . Such a graph is said to have degree sequence if vertex has degree for all , and has degree for all . (Our convention is to denote elements of by and elements of by .) We let denote the degree sequence of . When , we use the fact that a digraph on vertices with out-degree sequence and in-degree sequence corresponds to a bipartite graph with degree sequence , the equivalence obtained by directing all edges from to . For use in the digraph case, if we define , and for we define . The digraph contains a loop if and only if the bipartite graph has an edge joining to .

The following probability spaces play an important role in this paper. Let

denote the bipartite graph chosen uniformly at random among all bipartite graphs with bipartition and with edges. In the case when , conditioning on the event that none of those edges is of the form yields a model of random directed graphs without loops which we call . We define and to be the corresponding probability spaces of degree sequences of or of , respectively. Let

be the probability space of vectors of length

where the first elements are distributed as and the next are distributed as . Furthermore, let be the restriction of to the event , where is the sum of the first elements of the vector, and the sum of the other elements. Similarly, define to be the probability space of random vectors of length , every component being independently distributed as . Finally, let be the restriction of to the event , where is defined as above with . Note that if , then

(1)

which we note are both independent of .

Our main result for degree sequences of “medium density” states essentially that for certain sequences , the probability is asymptotically equal to , where and in the bipartite case, and in the digraph case, and where is a correction factor which we define next. For asymptotics in this paper, we take ; the restrictions on will also ensure that .

With and as above, let be a sequence of length . We set and use and to denote the vectors consisting of the first , and of the last , entries of respectively. Thus, . We also let and denote average of the components of , and of , respectively, that is and . Then we set

and, in the digraph case,

We unify our analysis of the two cases, bipartite graphs and digraphs, by introducing the indicator variable which is in the digraph case (in which case is assumed) and in the bipartite case (in which case terms containing as a factor may be undefined). This significantly simplifies notation and permits us to emphasise the similarities between the two cases. Define . (This will denote the relative edge density of a bipartite graph or a digraph with degree sequence .) We then set

(2)

for a sequence of length , where , , . We can now state our main result.

Theorem 1.1.

t:mainbip For a sufficiently small constant , the following holds. Let . Let , and be integers that satisfy

and for all fixed , . Let be the set of sequences with and of lengths and respectively, satisfying , and for all and all , where and . Either set and (the bipartite case), or set and and restrict to (the digraph case). Then uniformly for all ,

(3)

Recall that in this paper, asymptotic statements refer to . The condition , however, together with the trivial upper bound implies that as well. We prove this theorem in Section 4.

Remark 1.2.

In view of (1.1) and the fact that , the formula in t:mainbip is equivalent to the assertion that the number of bipartite graphs with degree sequence is

where is the error term from (3). Similarly, (1.1) and the fact that gives an asymptotic formula for the number of directed graphs with given degree sequence of in- and out-degrees.

Our corresponding result for the sparse case is the following. Although it is not new in the bipartite case (see below), it completes the full range of densities (in a sense, for instance, regarding regular digraphs) in the digraph case. For a sequence as above, define , and similarly .

Theorem 1.3.

t:sparseCaseBip Let , let , and be integers such that , and set , and . Let be a set of sequences such that and have length and , respectively, and such that and uniformly over . Either set and (the bipartite case), or set and and restrict to (the digraph case). Then uniformly for ,

We prove this theorem in Section 3 before tackling the more involved case of medium range densities. We note at this point that if we restrict to the set of sequences where for all and then and so the condition is always true (as tends to infinity), and the condition is implied by as . Sequences failing these conditions therefore contain entries 0, which are less interesting since the formula is then often implied by considering only the non-zero entries. One could also apply our method to reach further into this very sparse case, but given these considerations, it is possibly not warranted, and we do not attempt to do so here. Similarly, further examination of our argument should yield results covering cases with wider disparities between and .

There have been many contributions to this topic in the past. Finding (asymptotic) formulae for the number of bipartite graphs with a given degree sequence goes back to Read’s thesis [20] and gained wider interest since the 1970’s, including [2, 3, 4, 6, 7, 13, 15, 17, 19, 21]. In particular, the sparse case is best covered by Greenhill, McKay and Wang [10], who proved an asymptotic formula for the number of bipartite graphs of a given sequence , provided that and , and their result covers the bipartite version of t:sparseCaseBip, in terms of both the density range and the size of the error terms. This is supplemented by formulae for the number of dense bipartite graphs with specified degree sequences by Canfield, Greenhill, and McKay [5] that apply as long as and are not too far apart. In fact, in [5] it was found that the formulae for the sparse and the dense case can be unified to produce the formula in t:mainbip, which was implicitly conjectured in [5] to hold for the cases in between. This conjecture is essentially verified by t:mainbip for a wide range of parameters and .

A special case is that of so-called semi-regular bipartite graphs, in which all vertices on one side of the bipartition have degree , say, and all vertices on the other side have degree . So let denote the constant vector of length in which every entry is , and denote the constant vector of length in which every entry is In 1977, Good and Crook [8] suggested that the number of bipartite graphs with degree sequence is roughly when . Some of the references mentioned above, in particular [15] and [6], verify that this formula is correct up to a constant factor, for particular ranges of , , and , by showing that the number is

(4)

This asymptotic assertion is immediately equivalent to . Consequently, t:mainbip verifies (4) for a new range of parameters in the moderately dense case.

For digraphs without loops, there are far fewer corresponding results. For the dense case, i.e. when the number of edges is , a result by Greenhill and McKay [9] implies an asymptotic formula. Barvinok [1] provides upper and lower bounds which are coarser but their bounds apply to a wider range of in- and out-degree sequences. The only result we are aware of that explicitly enumerates loopless digraphs by degree sequence in the sparse case is by Bender [3], which only applies for bounded degrees. However, it is clear that the standard techniques used previously for sparse graph enumeration could be used to increase the density and obtain results more in line with the existing ones for bipartite graphs.

1.2 Models for the degree sequences of random graphs

In 1997, McKay and Wormald [16] showed that if a certain enumeration formula holds for the number of graphs of a given degree sequence then the degree sequences of the random graph models and can be modelled by certain binomial-based models. The model for showed that the degree sequence was distributed almost the same as a sequence of independent binomial random variables, subject to having even sum, but with a slight twist that introduces dependency. It was also shown there that for properties of the degree sequence satisfying some quite general conditions, this conditioning and dependency make no significant difference, and hence those properties are essentially the same as for a sequence of independent binomials.

At that time, the existing formulae for the sparse and the dense case supplied that relationship of the models. Recently, the enumeration results of [11] for the medium range provide the missing formulae for the gap range of densities, establishing a conjecture from [16]. A natural supposition since [16] appeared was that the degree sequences of random bipartite graphs and digraphs satisfy similar properties. This was an implicit conjecture of McKay and Skerman [14], who adapted some of the arguments in [16] to show that the existing enumeration results for dense bipartite graphs and directed graphs imply a binomial-based model of the degree sequences of such graphs. This is quite analogous to the model in the graph case, except that it contains an extra complicating conditioning required because the sum of degrees of the vertices in each part must be equal. McKay and Skerman point out that, once the enumeration formulae are proved in the missing ranges, one would expect the model results to follow. Our enumeration results stated above provide what is necessary to immediately establish the relevant conjecture in the case of and , as described below, provided that and are not too disparate. For their binomial random graph siblings and , in which edges are selected independently with probability , one would expect that arguments similar to those in [14], in conjunction with our results, will now suffice.

Let and be two sequences of probability spaces with the same underlying set for each . Suppose that whenever a sequence of events satisfies in either model, it is true that , where by we mean that as . Then we call and asymptotically quite equivalent (a.q.e.). We use to mean a function going to infinity as , possibly different in all instances.

Theorem 1.4.

t:bipmodel

  1. The probability spaces and are a.q.e. provided that ;

  2. The probability spaces and are a.q.e. provided that and at least one of the following holds:

    1. and for some fixed and we have and ;

    2. and for some fixed we have ;

    3. and .

We prove this theorem in Section 4. We note that the assertion for a for the range is covered by McKay and Skerman [14, Theorem 1(d)]. For the bipartite case [14, Theorem 1(c)] covers the range and . t:bipmodel bba applies for a slightly larger range of and , at least for large-ish . The last condition in ba is equivalent to

for some where . Thus, may be as large as for sufficiently large density .

Finally, we note that when for fixed then all values of are covered by t:bipmodel (swapping and in bba if necessary) and using [14, Theorem 1(d)] for the dense cases of both a and b.

1.3 Edge probabilities.

As a by-product of our proof of t:mainbip in Section 4, we obtain asymptotic formulae for the edge probabilities in a random bipartite graph with a given degree sequence, and of a random digraph with a given sequence of out- and in-degrees.

Theorem 1.5.

t:edgeprobability Let , , , and be as in t:mainbip and let or . Let and , with in the digraph case. Then uniformly for , the probability that is an edge of , conditional on the event that , is

where and .

We prove this theorem in Section 4.

2 Preliminaries

As we indicated in the introduction, the argument in this paper derives from that in [11], whose notation and structure we will follow quite closely. Differences occur though to account for the fact that we are dealing with certain forbidden edges. Naturally, we resort to notation used in [11] and add notation that is special to the bipartite case. We then state several intermediate results from [11].

2.1 Notation

Our graphs are simple, that is, they have no loops or multiple edges. We write to mean that , if for some constant , and if . We use to mean a function going to infinity, possibly different in all instances. Also denotes the set of -subsets of the set , and is often of the form , which denotes . In this paper multiplication by juxtaposition has precedence over “”, so for example .

Let be an integer and let . Assume that is specified; we call this the set of allowable pairs. Note that as usual we regard the edge joining vertices and as the unordered pair , and denote this edge by following standard graph theoretic notation. A sequence is called -realisable if there is a graph on vertex set such that vertex has degree and all edges of are allowable pairs. In this case, we say realises over . In standard terminology, if is -realisable, it is graphical. Let be the set of all graphs that realise over . The graph case when is dealt with in [11]. In this paper, we are particularly interested in the following two special cases of .

  • Bipartite graph case.
    Let be integers and set . Set . Then is the set of all bipartite graphs on vertex set that realise the degree sequence with one part being and the other part .

  • Digraph case.
    Assume that is even and let be an integer such that . Set . Then corresponds to the set of all bipartite graphs on vertex set that realise the degree sequence with one part being and the other part that do not contain any edge of a predefined matching, or equivalently, corresponds to the set of all digraphs on vertex set that have no loops and that realise the out-degree sequence and in-degree sequence . Recall that for and for so that edges of the form are forbidden.

Let , be integers and suppose that is a sequence of length . Recall the definitions of , , , , , , and of from the introduction. We also use or to denote , in line with the notation for maximum degree of a graph. With understood (to be either or in this paper) we write for the quantity and note that this agrees with the definition of given just above (2) in the introduction. Throughout this paper we use to denote the elementary unit vector with 1 in its coordinate indexed by . We say is balanced if . Clearly being balanced is necessary for to be -realisable in either of the cases or . Furthermore, we say that is -heavy if , and we call it -heavy if .

Finally, we use to denote a quantity between and inclusively.

2.2 Cardinalities, probabilities and ratios

We first quote a simple result by which we leverage absolute estimates of probabilities from comparisons of related probabilities.

Lemma 2.1 (Lemma 2.1 in [11]).

l:lemmaX Let and be probability spaces with the same underlying set . Let be a graph with vertex set such that for all . Suppose that such that , and such that for every edge of ,

where the constant implicit in is absolute. Let be an upper bound on the diameter of and assume . Then for each we have

with again a bound uniform for all .

Using the lemma calls for analysing the ratios of probabilities both in the “true” probability space (which will be the degree sequence of or of ) and in an “ideal” probability space by which we are approximating the true space. This leads to computing ratios of closely related instances of the expression on the right hand side of (3). Let be a sequence where and are of length and , respectively, let , and assume that is -heavy. Note first that the following are immediate from (1.1):

Similarly, straight from the definition of in (2) we have

where , , , which are, in this case, equal to , , and , respectively (recalling that is slightly different in the two cases of and ), and where, we recall, is the indicator variable for the digraph case. Therefore, denoting by the function , we get a “combined goal ratio” in the two cases which is

(5)

where .

To analyse the ratios of such nearby sequences in the “true” probability space note that, with the above notation, in t:mainbip is just where is the random graph space or . Let us introduce some more notation. Let , i.e. a subset of the allowable edges. We write and for the number of graphs that contain, or do not contain, the edge set , respectively. (When and similar notation is used, the set should be clear by context.) We abbreviate to if (i.e. contains the single edge ), and put . Additionally, for a vertex , we set , and, with understood, we use for the set of such that .

We pause for a notational comment. In this paper, a subscript

is always interpreted as an ordered pair

rather than an edge (and similar for triples). This is irrelevant for since the two ordered pairs signify the same edge, but the distinction is important with other notation such as the following. For vertices , if is a sequence such that is -realisable, we define

(6)

and note that this is exactly . Estimating those “true” ratios will be tightly linked to estimating the following. For , let

which is the probability that the edges in are present in a graph that is drawn uniformly at random from . Of particular interest are the probability of a single edge and a path , for which we simplify the notation to

(7)

The following is [11, Lemma 2.2], used to switch between degree sequences of differing total degree.

Lemma 2.2.

trick17 Let and let be a sequence of length . Then

In Lemma 2.3 in [11] we bound the probability of an edge of a random graph in in the graph case. A similar switching argument is used to obtain corresponding bounds in the bipartite and digraph cases. Recall that by we denote , and that .

Lemma 2.3.

l:simpleSwitching Let be or and let be an -realisable sequence. Then for any we have

Proof.

Assume without loss of generality that which forces in both the digraph and bipartite cases. For each bipartite graph with degree sequence and an edge joining and , we can perform a switching (of a type often used previously in graphical enumeration) by removing both and another randomly chosen edge (with , and ), and inserting the edges and , provided that no multiple edges are formed. Note that the way we choose and no loops can occur this way. In the digraph case, we should also make sure that and that , since the pairs and are not allowable. The number of such switchings that can be applied to with the vertices of each edge ordered, is at least

since there are ways to choose and , whereas the number of such choices that are ineligible is at most the number of choices with being a neighbour of (which automatically rules out ) or , or similarly for . On the other hand, for each graph in which is not an edge, the number of ways that it is created by performing such a switching backwards is at most . Counting the set of all possible switchings over all such graphs and two different ways shows that the ratio of the number of graphs with to the number without is at most

Hence , and the lemma follows in both cases. ∎

2.3 Proof structure

We recall the template of the method introduced in [11]. We follow this template in both the sparse and dense cases.

Step 1. Obtain an estimate of the ratio between the numbers of graphs of related degree sequences, using the forthcoming Proposition LABEL:l:recurse. This step is the crux of the whole argument.

Step 2. By making suitable definitions, we cause this ratio to appear as the expression for some probability space on an underlying set in an application of Lemma LABEL:l:lemmaX. There, is the set of degree sequences, with probabilities in determined by the random graph under consideration, and the graph in the lemma has a suitable vertex set of such sequences. Each edge of is in general a pair of degree sequences and of the form occurring in the definition of . Having defined , we may call any two such degree sequences adjacent.

Step 3. Another probability space is defined on , by taking a probability space

directly from a joint binomial distribution, together with a function

that varies quite slowly, and defining probabilities in by the equation .

Step 4. Using sharp concentration results, show that in both of the probability spaces and (where, by , we mean approximately equal to, with some specific error bound in each case). As part of this, we show that . At this point, we may specify for the application of Lemma LABEL:l:lemmaX.

Step 5. Apply Lemma LABEL:l:lemmaX and the conclusions of the previous steps to deduce . Upon estimating the errors in the approximations, which includes bounding the diameter of the graph , we obtain an estimate for the probability of the random graph having degree sequence in terms of a known quantity.

2.4 Realisability

As in the graph case in [11], before estimating how many (bipartite) graphs have degree sequence , for preparation we need to know that there is at least one such graph for various . Mirsky [18, p. 205] gives a necessary and sufficient condition for the existence of a non-negative integer matrix with row and column sums in specified intervals. For the case that those sums are specified precisely, the statement is the following.

Theorem 2.4 (Corollary of Mirsky [18]).

Let , , be integers for all , such that . Then there exists an integer matrix with row sums and column sums such that for all such and if and only if, for all and ,

We use this to show existence of bipartite graphs with given degrees and forbidden edges for the cases of interest, avoiding maximum generality in order to keep it simple. In order to apply this to digraphs, one would set and regard the edges as directed from the first part to the second. For loopless digraphs, we merely forbid all edges of the form . Recall that, with and understood, we set and for convenience.

Lemma 2.5.

lem:bipRealisable Given a constant , the following holds for sufficiently large and sufficiently small. Let and be integers for all , , with . Also let be a set of unordered pairs, representing forbidden edges, with no more than pairs in containing any . Let and . Then there exists a bipartite graph with bipartition with degrees for and for , and containing no edge in the forbidden set , provided that either of the following holds.

  1. We have , as well as and where and .

  2. We have and .

Proof.

We will apply Theorem 2.4 with if is a forbidden edge, and otherwise, and with and . Note that for all subsets , , where and . We will show that for all and , with and , we have

(8)

Equivalently, . Note that with the previous observation and Theorem 2.4, this implies that there is a matrix which is the adjacency matrix of the desired bipartite graph.

For (a), suppose first that . Then using and , we find that the right hand side of (8) is at most , and (8) follows. A symmetric argument works if . So we may assume that neither of these occur. Then