Fast uniform generation of random graphs with given degree sequences

05/09/2019 ∙ by Andrii Arman, et al. ∙ University of Waterloo Monash University 0

In this paper we provide an algorithm that generates a graph with given degree sequence uniformly at random. Provided that Δ^4=O(m), where Δ is the maximal degree and m is the number of edges,the algorithm runs in expected time O(m). Our algorithm significantly improves the previously most efficient uniform sampler, which runs in expected time O(m^2Δ^2) for the same family of degree sequences. Our method uses a novel ingredient which progressively relaxes restrictions on an object being generated uniformly at random, and we use this to give fast algorithms for uniform sampling of graphs with other degree sequences as well. Using the same method, we also obtain algorithms with expected run time which is (i) linear for power-law degree sequences in cases where the previous best was O(n^4.081), and (ii) O(nd+d^4) for d-regular graphs when d=o(√(n)), where the previous best was O(nd^3).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Sampling discrete objects from a specified probability distribution is a classical problem in computer science, both in theory and for practical applications. Uniform generation of random graphs with a specified degree sequence is one such problem that has frequently been studied. In this paper we consider only the task of generating

simple graphs, i.e. graphs with no loops or multiple edges. An early algorithm was given by Tinhofer [tinhofer79], but with unknown run time. A simple rejection-based uniform generation algorithm is usually implicit for asymptotically enumerating graphs with a specified degree sequence, for example in the papers of Békéssy [bekessy1972], Bender and Canfield [bender1978] and Bollobás [bollobas1980]. The run time of this algorithm is linear in but exponential in the square of the average degree. Hence it only works in practice when degrees are small.

A big increase in the permitted degrees of the vertices was achieved by McKay and Wormald [mckay90], and around the same time Jerrum and Sinclair [jerrum90]

found an approximately uniform sampler using Markov Chain Monte Carlo (MCMC) methods. McKay and Wormald used the configuration model introduced in 

[bollobas1980] to generate a random (but not uniformly random) multigraph with a given degree sequence. Instead of repeatedly rejecting until finding a simple graph, McKay and Wormald used a switching operation to switch away multiple edges, reaching a simple graph in the end. The algorithm is rather efficient when the degrees are not too large. In particular, for -regular graphs it runs in expected time when . (Here and in the following we assume is the number of vertices.) Jerrum and Sinclair’s Markov chain mixes in time polynomial in provided that the degree sequence satisfies a condition phrased in terms of the numbers of graphs of given degree sequences. In particular, the mixing time is polynomial in the -regular case for any function . These two benchmark research papers led the study into two different research lines. More switching-based algorithms for exactly uniform generation were given which deal with new degree sequences permitting vertices of higher degrees. The regular case was treated by Gao and Wormald [gao17] for with time complexity again , and very non-regular but still quite sparse degree sequences (such as power law) [gao18] were considered by the same authors. Various MCMC-based algorithms have been investigated for generating the graphs with distribution that is only approximately uniform, e.g. algorithms by Cooper, Dyer and Greenhill [greenhill14], Greenhill [greenhill14], Kannan, Tetali and Vempala [kannan99]

. These algorithms can cope with a much bigger family of degree sequences than the switching-based algorithms. That these do not produce the exactly uniform distribution might be irrelevant for practical purposes, if it were not for the fact that the theoretically provable mixing bounds are too big. For instance, the mixing time was bounded by

in [cooper07] in the regular case. We note that there have also been switching-based approximate samplers that run fast (in linear or sub-quadratic time), for instance see paper of Bayati, Kim and Saberi [bayati10], Kim and Vu [kim03], Steger and Wormald [steger99] and Zhao [zhao13]. For those algorithms, the bounds on error in the output distribution are functions of which tend to 0 as grows, but cannot be reduced for any particular by running the algorithm longer. In this way they differ from the MCMC-based algorithms, which are fully-polynomial almost uniform generators in the sense of [jerrum90].

The goal of this paper is to introduce a new technique for exactly uniform generation. Using it to modify switching-based algorithms, we can obtain vastly reduced run times. In particular, we obtain a linear-time, i.e. , algorithm that works for the same family of degree sequences as the algorithm in [mckay90]. We first review the salient features of the latter.

The algorithm first generates an initial random multigraph in time that is linear in . The initial pairing contains no loops of multiplicity at least two, no multiple edges of multiplicity at least three, and has a sublinear number of loops and double edges. The algorithm then uses an operation called -switching to sequentially “switch away” all the double edges (loops are treated similarly so we ignore them at present). Provided that a multigraph was uniform in the class of graphs with double edges, the result of applying a random -switching to is a random multigraph that is slightly non-uniformly distributed in a class of multigraphs with double edges. The following rejection scheme is used to equalise probabilities. Let be the the number of ways that a -switching can be performed on and be the number of -switchings that can create . Assume that and are uniform upper and lower bounds for and respectively over all multigraphs with double edges. If a switching that converts some multigraph to a multigraph is selected by the algorithm, then the switching is accepted with probability , and rejected otherwise. If the switching is accepted, it is applied to the multigraph, whereas rejection requires re-starting the algorithm from scratch. Computing takes time, which dominates the time complexity of [mckay90].

The algorithm presented in this paper is obtained from the algorithm in [mckay90] by modifying the time-consuming rejection scheme. First, it was observed in [mckay90] that the rejection can be separated into two distinct steps, which are given the explicit names f- and b-rejection in [gao17]. The f-rejection step rejects the selected switching with probability , and the b-rejection step rejects it with probability . It is easy to see that the overall probability of accepting the switching is the same as specified originally above. By a slick observation, there is essentially no computation cost for computing the probability of f-rejection. (See the explanations in Section 4.4). The modification in the present paper is to further separate b-rejections into a sequence of sub-rejections by a scheme we will call incremental relaxation. This scheme will still maintain uniformity of the multigraphs created.

The basic idea of incremental relaxation, as used in the present paper, can be described as follows. Let be a (small) graph with each edge designated as positive or negative. We say that an -anchoring of a graph is an injection that maps every positive edge of to an edge of , and every negative edge to a non-edge of . (This is a generalisation of rooting at a subgraph, which usually corresponds to the case that has positive edges only.)

Now assume that an -anchored graph

is chosen u.a.r., i.e. each such ordered pair with

in some given set , and , an -anchoring of , is equally likely. We can convert this to a random graph by finding the number of -anchorings of , and accepting with probability where is a lower bound on the number of -anchorings of any element . However, computing corresponds to computing as described above and can be time-consuming. The key idea of our new method is that we incrementally relax the constraints imposed on by , so that rejection is split into a sequence of sub-rejections. Set and let denote the restriction of to . With this definition, for each , is an -anchoring of . Thus determines some subset (increasing with ) of the constraints on corresponding to the edges of , and given that is uniformly random, we can obtain a uniformly random anchoring by applying a similar rejection strategy, but using only the number of ways that can be extended to an -anchoring of . This procedure of incremental relaxation of constraints can be highly advantageous if for each , can be computed much faster than . In this way, a sequence of uniformly random objects is obtained, involving anchorings at ever-smaller subgraphs of , until the empty subgraph is reached, corresponding to obtaining u.a.r.

To see that this idea applies to the problem at hand, we observe that the existence of a -switching (defined in Section 4.2) from to forces to include a set of edges (the positive edges, forming two paths of length 2, in a copy of a certain graph ), and to exclude a set (the negative edges, forming a matching, in ). So comes accompanied by an -anchoring.(Refer to right side of Figure 2 for a drawing of .) To apply incremental relaxation we first compute the number of ways to complete such an anchoring given the first 2-path and use that to obtain a random 2-path-anchored graph, and then relax the 2-path anchoring in a similar manner. The details of applying this scheme to -switchings are given in Section 4.2.

In Section 3 we present the incremental relaxation technique in a more general setting, avoiding injections but instead employing more arbitrary sets of constraints. We apply the incremental relaxation scheme in detail in the case (e.g.  in the regular degree case) in Sections 4 – 4.4. The switchings we use are exactly the same as those in [mckay90]. When the incremental relaxation scheme is combined with the new techniques introduced in [gao17, gao18], it allows us to obtain fast uniform samplers of graphs for the family of degree sequences permitted in [gao17, gao18]. In particular, we obtain a linear-time algorithm to generate graphs with power-law degrees, and a sub-quadratic-time algorithm to generate -regular graphs when . We will discuss these algorithms in Sections 5 and 6.

2 Main results

Let be specified where is even. Let . Our first result is that our algorithm INC-GEN uniformly generates a random graph with degree sequence and runs in linear time provided that is “moderately sparse”. The description of INC-GEN is given in Section 4. The proof of the uniformity will be presented in Section 4.3, and the time complexity is bounded in Section 4.4.

Theorem 1.

Algorithm INC-GEN uniformly generates a random graph with degree sequence . If then the expected run time of INC-GEN is .

Our second algorithm, INC-REG, described in Section 5, is an almost-linear-time algorithm to generate random regular graphs. The run time is when . This improves the run time of the uniform sampler in [gao17].

Theorem 2.

Algorithm INC-REG uniformly generates a random -regular graph. If then the expected run time of INC-REG is .

Our third algorithm, INC-POWERLAW, described in Section 6, is a linear-time algorithm to generate random graphs with a power-law degree sequence. A degree sequence is said to be power-law distribution-bounded with parameter , if the minimum component in is at least 1, and there is a constant independent of such that the number of components that are at least is at most for all . Note that the family of power-law distribution-bounded degree sequences covers the family of degree sequences arising from

i.i.d. copies of a power-law random variable. Uniform generation of graphs with power-law distribution-bounded degree sequences with parameter

was studied in [gao18], where a uniform sampler was described with expected run time . This was the first known uniform sampler for this family of degree sequences. With our new rejection scheme, we improve the time complexity to linear.

Theorem 3.

Let be a power-law distribution-bounded degree sequence with parameter . Algorithm INC-POWERLAW uniformly generates a random graph with degree sequence , and the expected run time of INC-POWERLAW is .

Algorithms INC-GEN and INC-REG can easily be modified if represents a bipartite graph’s degree sequence. As an example, we present algorithm INC-BIPARTITE in Section 7 as the bipartite version of INC-GEN.

Theorem 4.

Algorithm INC-BIPARTITE uniformly generates a random graph with bipartite degree sequence . If then the expected run time of INC-BIPARTITE is .

3 Uniform generation by incremental relaxation

We provide here a general description of the relaxation procedure, so it can be applied in different setups. Let and be given, where is a finite set and is a positive integer. We are also given , for , where each is a multiset consisting of subsets of . Let denote the Cartesian product, and let be any subset of such that each satisfies . Given , define for each . For each set and set .

For any and , define ; i.e.  is the prefix of .

Later in our applications of relaxation, we will let be a set of multigraphs. Each element of can be identified with a multigraph that contains a specified substructure (determined by the -s) on a specified set of vertices. In terms of the notation introduced in Section 1, elements of will correspond to -anchorings of multigraphs for some graph and some sequence . Permitting multiple copies of elements in is useful in the case where two distinct constraints may correspond to the same subset of . This happens in our applications due to the symmetry of the substructures in .

Next we define a procedure Loosen, which takes an as input, and outputs an with a certain probability and otherwise ‘rejects’ it and terminates. Our Relaxation Lemma (Lemma 5 below) shows that if is uniformly distributed in then the output of Loosen is uniformly distributed in .

For and , let be the number of such that . In other words, is the number of ways to extend to an element of . Let be a lower bound on over all , and assume that for all , . For with we define the following procedure.

procedure Loosen:
Output with probability , and reject otherwise.

Procedure Relax is defined for . It repeatedly calls Loosen until reaching a . We say that procedure Relax performs incremental relaxation on .

procedure Relax:
;
while  do
       ;
       .
end while
Output .
Lemma 5 (Relaxation Lemma).

Assume that and . Provided that is chosen uniformly at random, the output of Loosen is uniform in assuming no rejection.

Proof.  Let . For any , the probability that Loosen outputs is equal to

where denotes the event that the input of Loosen is . The second probability above is the conditional probability that no rejection occurs in Loosen, given . By our assumption, the first probability above is always equal to . By the definition of Loosen, the second probability above is equal to . By definition, is exactly the number of , such that , so the sum has exactly terms, each of which is equal to . Hence, the probability for Loosen to output is equal to , for every .    

Recalling that , the Relaxation Lemma immediately yields the following corollary for the uniformity of Procedure Relax.

Corollary 6.

Assume that for all , , and assume is chosen uniformly at random. Then the output of Relax is uniform in , if there is no rejection.

The description of Relax as repeated calls of Loosen is useful for analysing the algorithm, but for practical implementations we refer to the following corollary.

Corollary 7.

Procedure Relax, applied to , outputs with probability
, and ends in rejection otherwise.

In practice, we predefine the numbers . Once the numbers are computed, the b-rejection can be performed in one step using Corollary 7, and there is no need to perform Relax with its iterated calls to Loosen. As mentioned in Section 1, these numbers can be much faster to compute than the number of -anchorings of , which would be required using the scheme in [mckay90]. We also reiterate that, unlike the scheme in [mckay90], the rejection probability depends on the anchoring imposed by , as well as .

4 Algorithm Inc-Gen

In this section we provide a description of INC-GEN. Let be given. We will use the configuration model [bollobas1980] to generate a random pairing, defined as follows. For every , represent vertex as a bin containing exactly points. Take a uniformly random perfect matching over the set of points in the bins. Call the resulting matching a pairing and call each edge in a pair. Finally identify the bins as vertices, and represent each pair in as an edge. This produces a multigraph from , denoted by . If a set of pairs in form a multiple edge or loop in then this set of pairs is called a multiple edge in as well, with the same multiplicity as it has in . A loop is a pair with both ends contained in the same bin/vertex. If there is a set containing more than one pair with all ends contained in the same vertex, then this set of pairs form a multiple loop. We always use loop to refer to a single loop with multiplicity equal to one. We call a multiple edge with multiplicity 2 or 3 a double or triple edge respectively. Let denote the set of all pairings with degree sequence . Define

(1)

if and define otherwise. Let denote the set of pairings in where there are no multiple edges with multiplicity at least 3, and no multiple loops with multiplicity at least 2, and the number of loops and double edges are at most and respectively. The following result was proved in [mckay90].

Lemma 8.

Let be a uniformly random pairing in . There exists a constant such that for all sufficiently large .

Proof. We first note that if , then since is large enough and , we have . So we only need to consider the case when and are defined by (1).

If then the claim follows by [mckay90, Lemmas 2 and ]. If then contains triple edges in expectation, whereas the expected number of other types of multiple edges in the pairing is bounded by

. In the case that the expected number of triple edges is asymptotically a positive constant, the standard method of moments can be used to show that the joint distribution of the numbers of triple edges, double edges and loops are asymptotically independent Poisson variables. This implies our assertion. See also the discussion of this case in the proof of 

[mckay90, Theorem 3].    

The first step of our algorithm is to use the configuration model to generate a uniformly random pairing . Proceed if . Otherwise, reject and restart the algorithm. This type of rejection is called initial rejection. By Lemma 8, this initial rejection stage takes only rounds in expectation before successfully producing a multigraph with at most double edges and at most loops. Then the algorithm calls two procedures, NoLoops and NoDoubles. Each of these is composed of a sequence of switching steps. In each switching step, a loop (in NoLoops) or a double edge (in NoDoubles) will be removed using the corresponding switching operation in the procedure.

Algorithm INC-GEN:
Generate a uniformly random pairing .
Reject if (initial rejection) and otherwise set ;
NoLoops();
NoDoubles().

Various types of rejections may occur in procedures NoLoops and NoDoubles. In all cases, if a rejection occurs then the algorithm restarts from the first step.

Let and be the set of multigraphs with degree sequence , loops, double edges and no other types of multiple edges. The following lemma guarantees uniformity of the multigraph obtained after initial rejection.

Lemma 9.

Let be a uniformly random pairing in . Let where and . Conditional on the number of loops and double edges in being and , is uniformly distributed over .

Proof. This follows from the simple observation that every pairing in appears with the same probability, and every multigraph in corresponds to exactly distinct pairings.   

Note that if , then , and so INC-GEN never calls NoLoops or NoDoubles . By Lemma 9, output of INC-GEN is a uniformly distributed in . Also, by Lemma 8, INC-GEN restarts constant number of times in expectation before outputting a graph. Hence, in this case we proved Theorem 1. For the rest of this section we assume .

In the next subsection we define the procedure NoLoops. This procedure uses the same switchings as in [mckay90] (but applied to multigraphs rather than pairings) to reduce the number of loops to 0.

4.1 NoLoops

Definition 10 (-switching).

For a graph , choose five distinct vertices such that

  • there is a loop on .

  • and are single edges;

  • there are no edges between and , and , and .

An -switching replaces loop on and edges , , by edges , and .

See Figure 1 for an illustration of an -switching. Note that this switching is the same as the one used in [mckay90], except performed on graphs, not pairings.

Figure 1: -switching.

Let be the number of -switchings that can be performed on . We will specify a parameter such that

In each switching step, a uniformly random switching converting to some is selected. An f-rejection occurs with probability . We will next describe how to use incremental relaxation to do b-rejection. If is neither f-rejected nor b-rejected, then will be performed in this switching step.

We first give some notation. In a multigraph, a (simple) ordered edge is an ordered pair of vertices such that is a (simple) edge in the multigraph. Similarly, a (simple) ordered -path is an ordered set of vertices such that forms a (simple) -path in the multigraph.

Define to be the number of simple ordered -paths in such that there is no loop on . For a simple ordered 2-path in define to be the number of simple ordered edges in that are vertex disjoint from and such that and are non-edges. For let and be lower bounds on and respectively over all and all simple ordered 2-paths in . Positive constants and will be defined in Section 4.1.1. Any switching that can be used to create a fixed multigraph from multigraphs in can be identified with the ordered set of vertices whose adjacencies were changed by . Set and .

Informally, each iteration of NoLoops starts with a multigraph and chooses a random -switching that converts to some . In terms of the notation defined in Section 1, each such switching can be viewed as an -anchoring of , where is a graph on the right side of Figure 1 (with positive signs on solid edges, and negative signs on dashed edges). NoLoops then performs f-rejection, after which every pair (denoting an -anchoring of ), where and is an -switching that creates , arises with the same probability. After that NoLoops sequentially relaxes constraints enforced by -anchoring of by performing a b-rejection. The following is the formal description of NoLoops.

procedure NoLoops:
while  has a loop do
       let be such that ;
       obtain from by performing a random -switching on ;
       f-rejection: restart with probability ;
       ;
       b-rejection: restart with probability ;
       ;
      
end while

In Section 4.3 we show that if is distributed uniformly at random in , the output of NoLoops(G) is uniform in . We do this by showing that the quantities and defined above coincide with the quantities and in an application of Corollary 7.

4.1.1 Parameters in NoLoops

Define

Recall that we assumed and so and are positive constants. The following Lemma establishes necessary bounds on , and .

Lemma 11.

Let with and . For any simple ordered 2-path in , we have

For forward -switchings

This completes the description of NoLoops.

4.2 NoDoubles

After NoLoops is finished, we have a multigraph . Next we describe how to reduce the number of double edges in .

Definition 12 (d-switching).

For a graph , choose six distinct vertices such that

  • there is a double edge between and .

  • , , are single edges;

  • the following are non-edges: , , , .

A -switching replaces double edges between and edges , , by edges , , , .

See Figure 2 for an illustration.

Figure 2: -switching.

For a graph , we use notation for the number of ways to perform a -switching on . We will specify such that

In each switching step, a uniformly random switching converting to some is selected. An f-rejection occurs with probability .

The incremental relaxation scheme for b-rejection is analogous to that in NoLoops. Define to be the number of simple ordered -paths in . For a simple ordered 2-path in define to be the number of simple ordered 2-paths that are vertex disjoint from such that , and are non-edges.

For let and be positive lower bounds (to be specified in Section 4.2.1) on and over all and simple ordered 2-paths in . For a -switching let be the vertices whose adjacencies were changed by . Set and .

procedure NoDoubles:
while  has a double edge do
       let be such that ;
       obtain from by performing a random -switching on ;
       f-rejection: restart with probability ;
       ;
       b-rejection: restart with probability ;
       ;
      
end while

As in case of NoLoops , In Section 4.3 we show the desired uniformity property holds for NoDoubles .

4.2.1 Parameters for NoDoubles

Define

Note that and are positive constants, as in Section 4.1.1.

Lemma 13.

Let . Then for any simple ordered 2-path in we have

4.3 Uniformity

Theorem 14.

INC-GEN generates graphs with degree sequence uniformly at random.

Proof.  We start the proof by showing that b-rejection in both NoLoops and NoDoubles can be performed as Relax for appropriate choice of . We deal here with NoDoubles only, as the issues with NoLoops are identical.

Let be the set of -switchings that convert a multigraph in to some multigraph in . Recall that switching can be identified with an ordered set of vertices whose adjacencies were changed by , and , .

Let and let be distinct vertices. Using the notation to denote a multiset, and to denote the set of simple edges in , define

Recall that

We now show that

Indeed, for a given simple ordered 2-path in , the number of simple ordered 2-paths such that , and are non-edges is equal to and is at least one according to Lemma 13. So for every pair with there exists a simple ordered 2-path , such that , which establishes the desired claim for .

If is a switching from to , we have that and so . So every pair , where switching creates , can be identified with an element , hence we can apply Relax to . In this setup, the quantities and (as in Section 3) are equal to and respectively. (Recall the definitions for and in Section 4.2.) It remains to note that we can set for where .

According to Corollary 7, Relax outputs with probability

which is exactly equal to the probability that is not b-rejected in NoDoubles.

Hence b-rejection in NoDoubles is just an effective implementation of Relax