Uniform generation of spanning regular subgraphs of a dense graph

07/03/2018 ∙ by Pu Gao, et al. ∙ UNSW 0

Let H_n be a graph on n vertices and let H_n denote the complement of H_n. Suppose that Δ = Δ(n) is the maximum degree of H_n. We analyse three algorithms for sampling d-regular subgraphs (d-factors) of H_n. This is equivalent to uniformly sampling d-regular graphs which avoid a set E(H_n) of forbidden edges. Here d=d(n) is a positive integer which may depend on n. Two of these algorithms produce a uniformly random d-factor of H_n in expected runtime which is linear in n and low-degree polynomial in d and Δ. The first algorithm applies when (d+Δ)dΔ = o(n). This improves on an earlier algorithm by the first author, which required constant d and at most a linear number of edges in H_n. The second algorithm applies when H_n is regular and d^2+Δ^2 = o(n), adapting an approach developed by the first author together with Wormald. The third algorithm is a simplification of the second, and produces an approximately uniform d-factor of H_n in time O(dn). Here the output distribution differs from uniform by o(1) in total variation distance, provided that d^2+Δ^2 = o(n).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Enumeration and uniform generation of graphs of given degrees has been an active research area for four decades, with many applications. Research on graph enumeration started in 1978 when Bender and Canfield [2] obtained the asymptotic number of graphs with bounded degrees. The constraint of bounded degrees was slightly relaxed by Bollobás [3] to the order of . A further relaxation on the degree constraints was obtained by McKay [14] to , and by McKay and Wormald [16] to , for the -regular graph case. On the other hand, asymptotic enumeration for dense -regular graphs was performed in 1990 by McKay and Wormald [17] for , leaving a gap in which no result was known: namely, when lies between and . This gap was filled recently by Liebenau and Wormald [13].

Uniform generation of graphs with given degrees has an equally long history, and is closely related to asymptotic enumeration. Tinhofer [19]

is among the first who investigated algorithmic heuristics, and realised that efficient uniform generation, or approximately uniform generation, is difficult. Graph enumeration arguments can sometimes be adapted to provide algorithms for uniform generation of graphs with given degrees. The proofs in 

[2, 3] immediately yield a simple rejection algorithm, which uniformly generates graphs of degrees at most in expected polynomial time (polynomial as a function of ). The switching arguments in [14, 16] were adapted to give a polynomial-time (in expectation) algorithm [15] for uniformly generating random -regular graphs when . However, overcoming the barrier was extremely challenging, with no progress for more than two decades. A major breakthrough was obtained by Wormald and the first author [8, 9], by modifying the McKay-Wormald algorithm [15] to a more flexible form, which allows the use and classification of different types and classes of switchings. This new technique greatly extended the range of degree sequences for which uniform generation is possible. The first application [8, 9] of the new technique gave an algorithm for uniform generation of -regular graphs with expected runtime when . The second application [10] gave an algorithm for uniform generation of graphs with power-law degree sequences with exponent slightly below 3, with expected runtime

with high probability.

Although uniform generation of random regular graphs has been challenging and remains open for of order at least , various approximate samplers have been developed and proven to run in polynomial time. Jerrum and Sinclair gave a MCMC-based scheme [11]

which generates all regular graphs approximately uniformly in polynomial time, although no explicit bound on the runtime was given. Kannan, Tetali and Vempala used another Markov chain to sample random regular bipartite graphs. They proved that the mixing time is polynomial without giving an explicit bound. Later this chain was extended by Cooper, Dyer and Greenhill 

[4, 5] to generate random regular graphs, with mixing time bounded by . There are asymptotically approximate samplers [1, 8, 9, 12, 18, 20] which generate -regular graphs very fast, typically with linear or up to quadratic runtime, and with an output of total variation distance from the uniform.

Compared with generating random -regular graphs with vertices, it is naturally more challenging to generate random -factors of a given graph with vertices. We call the host graph, and we call , the complement of , the forbidden graph. Let denote the maximum degree of .

There has not been much research in the direction of uniform generation of -factors. The first author gave a rejection algorithm in [7] which has an expected linear runtime when and contains at most a linear number of edges. Here is not necessarily regular, and the maximum degree of can be linear. We are not aware of any other algorithms which have explicit polynomial bounds on the runtime and vanishing bounds on the approximation error. For approximate sampling, the Jerrum–Sinclair algorithm [11, Section 4] generates an approximately uniform -factor of in polynomial time, as long as . It seems unlikely that the approach of Cooper et al [4] can be adapted to the setting of sampling -factors, due to the complexity of the analysis. Erdős et al. [6] analysed a Markov chain algorithm which uniformly generates bipartite graphs with a given half-regular degree sequence, avoiding a set of edges which is the union of a 1-factor and a star. Here “half-regular” means that the degrees on one side of the bipartition are all the same, with the possible exception of the centre of the star.

The aim of this paper is to develop efficient algorithms that sample -factors of uniformly or approximately uniformly. We will describe and analyse three different algorithms, which we call FactorEasy, FactorUniform and FactorApprox. Our main focus is FactorUniform, which is an algorithm for uniformly generating -factors of an -regular host graph , when and are not too large. For smaller and , the simpler algorithm FactorEasy is more efficient, and does not require to be regular (here is the maximum degree of ). Finally, FactorApprox is a linear-time algorithm for generating -factors of asymptotically approximately uniformly, under the same conditions as FactorUniform.

Our results are stated formally below. All asymptotics are as the number of vertices tends to infinity, along even integers if

is odd. Throughout,

and are positive integers which may depend on .

Theorem 1.1.

Let be a graph on vertices such that the maximum degree of is . The algorithm FactorEasy uniformly generates a -factor of . If then FactorEasy runs in time in expectation.

Theorem 1.2.

Let be an -regular graph on vertices. The algorithm FactorUniform uniformly generates a -factor of . If then FactorUniform runs in time a.a.s., and in time in expectation, where

Theorem 1.3.

Let be an -regular graph on vertices. Assume that . The algorithm FactorApprox approximately generates a uniformly random -factor of in time in expectation. The distribution of the output of FactorApprox differs from uniform by in total variation distance.

Remark: For simplicity we considered regular spanning subgraphs in this paper, and the host graph is assumed regular for FactorUniform and FactorApprox. However, all of these algorithms are flexible and can be modified to cope with more general degree sequences for the spanning subgraph and for the host graph. For instance, the McKay-Wormald algorithm [15] can uniformly generate a subgraph of with a given degree sequence where the maximum degree is not too large. We believe that FactorEasy can be easily modified for general degree sequences, by calling [15], and then slighly modifying the analysis, with essentially the same switching. To cope with denser irregular subgraphs or sparser irregular host graphs, FactorUniform and FactorApprox can be modified accordingly, by possibly introducing new types of switchings. We do not pursue this here.

In Section 2, we describe the common framework of FactorEasy and FactorUniform, and define some key parameters that will appear in these two algorithms and in FactorApprox. More detail on the structure of the paper can be found at the end of Section 3.1.

2 The framework for uniform generation

The new approach of Gao and Wormald [8, 9] gives a common framework for Algorithms FactorEasy and FactorUniform, which we will now describe. The Gao–Wormald scheme reduces to the McKay–Wormald algorithm [15] by setting certain parameters to some trivial values. Our algorithm FactorEasy is indeed an adaptation of the simpler McKay-Wormald algorithm, whereas FactorUniform uses the full power of [8, 9], which allows it to cope with a larger range of and .

It is convenient to think of the host graph as being defined by a 2-colouring of the complete graph with the colours red and black, where edges of are coloured red, while edges of are coloured black. Then our aim is to uniformly sample -factors of which contain no red (forbidden) edges.

Both algorithms FactorEasy and FactorUniform begin by generating a uniformly random -regular graph on . For the range of which we consider, this can be done using the Gao–Wormald algorithm REG [8, 9]. Typically this initial graph will contain some red edges. Let denote the set of all -regular graphs containing precisely red edges. The set is called a strata. For some positive integer parameter , which we must define for each algorithm, let

If the initial graph does not belong to then the algorithm will reject and restart. Otherwise, the initial graph belongs to and so it does not contain too many red edges. Then the algorithm will perform a sequence of switching operations (which we must define), starting from , until it reaches a -regular graph with no red edges. At each switching step there is a chance of a rejection: if a rejection occurs then the algorithm will restart. This rejection scheme must also be defined, for each algorithm.

In FactorEasy, only one type of switching (Type I) is used. Each such switching reduces the number of red edges by exactly one. As soon as FactorEasy reaches a -regular graph with no red edges, it outputs that graph, provided that no rejection has occurred. When is of order or greater, the probability of a rejection occurring in FactorEasy is very close to 1 and FactorEasy becomes inefficient.

FactorUniform reduces the probability of rejection by permitting some switchings that are invalid in FactorEasy, as well as introducing other types of switchings. Switchings which typically reduce the number of red edges by exactly one will still be the most frequently applied switchings, although in FactorUniform we relax these switchings slightly so that certain operations which were forbidden in FactorEasy will be permitted in FactorUniform. The other types of switchings do not necessarily reduce the number of red edges. Rather counterintuitively, some switchings will create more red edges. The new types of switchings are introduced for the same reason as the use of the rejection scheme: to remedy the distortion of the distribution which arises by merely applying Type I switchings.

We will specify parameters , for and , where is the set of the types of switchings to be applied in FactorUniform. In each step of FactorUniform, ignoring rejections that may occur with a small probability, a switching type is chosen with probability if the current graph is in . Then, given , a random switching of type is performed. As mentioned before, the Type I switchings are the most common: in fact, we set close to 1 for each . If the current graph lies in then a Type I switching applied to that graph will simply output the graph, but any other type of switching will not. Hence FactorUniform does not always immediately produce output as soon as it reaches a graph in , unlike FactorEasy.

As a preparation, we compute the expected number of red edges in a random -regular subgraph of .

Lemma 2.1.

Let be a uniformly random -regular graph on . The expected number of red edges in is .

Proof.

Let be a red edge in (that is, an edge in ). We know that is incident with edges in , and by symmetry each of the edges incident with in is equally likely to be in . Thus, the probability that is . There are exactly red edges in . By linearity of expectation, the expected number of red edges in is . ∎

3 The algorithm FactorEasy

Define

(3.1)
(3.2)

The following is a direct corollary of Lemma 2.1, using Markov’s inequality.

Corollary 3.1.

With probability at least , a uniformly random -regular graph on contains at most red edges.

We now define the switching operation which we use in FactorEasy, called a 3-edge-switching. To define a 3-edge-switching from the current graph , choose a sequence of vertices such that is a red edge in , and are edges in (with repetitions allowed), and the choice satisfies the following conditions:

  • and are black edges in ,

  • , and are all non-present in , and are all black in ;

  • The vertices are distinct, except that is permitted.

We say that the 6-tuple is valid if it satisfies these conditions. Given a valid 6-tuple , the 3-edge-switching determined by deletes the three edges , , and replaces them with the edges , , , producing a new graph . This switching operation is denoted by , and is illustrated in Figure 1.

Figure 1: A 3-edge switching

Each 3-edge switching reduces the number of red edges by exactly one. The inverse operation, obtained by reversing the arrow in Figure 1, is called an inverse 3-edge switching. The 6-tuple is valid for the inverse 3-edge switching if exactly one new red edge is introduced, no multiple edges are introduced, and all vertices are distinct except possibly .

Define

Let denote the number of valid 6-tuples which determine a 3-edge switchings that can be applied to , and let denote the number of valid 6-tuples which determine a inverse 3-edge switchings that can be applied to . In the following lemma, we show that is approximately and is approximately for .

Lemma 3.2.

Suppose that . Let . For all we have

Hence

The proof of Lemma 3.2 is deferred to Section 3.2.

3.1 Algorithm FactorEasy: definition and analysis

First, FactorEasy calls the Gao–Wormald algorithm REG [8, 9] to generate a uniformly random -regular graph on . If contains more than red edges then FactorEasy restarts. Otherwise, FactorEasy iteratively performs a sequence of switching steps. Let be the graph obtained after switching steps, and assume . The -th switching step is composed of the following substeps:

  1. If then output .

  2. If then uniformly at random choose a red edge in and choose two further edges in (of any colour), with repetition allowed. Randomly label the two endvertices of the red edge as and , and the endvertices of the other two edges as and , and and respectively. If is a valid 6-tuple then let be the 3-edge switching induced by this 6-tuple, as in Figure 1. Otherwise (when the 6-tuple is not valid), perform an f-rejection.

  3. If no f-rejection is performed then perform a b-rejection with probability

  4. If no b-rejection is performed then set .

As mentioned before, if any rejection occurs then FactorEasy restarts.

Proof of Theorem 1.1.

First we prove uniformity by induction. Recall that is a uniformly random -regular graph in if no initial rejection occurs. If then clearly

is uniformly distributed over

.

Next we prove that if is uniformly distributed over then is uniformly distributed over , assuming that no rejection occurs. For every and , let denote the set of valid 6-tuples such that is a 3-edge-switching, and let

(3.3)

Given , note that is exactly the number of choices of a 6-tuple of vertices , with repetition allowed, such that is a red edge in , and and are edges of . Thus, the probability that is converted to without f-rejection in step is equal to

The probability that no b-rejection occurs is equal to . Write , which is invariant over all graphs by the inductive hypothesis. Then

using the fact that the union in (3.3) is disjoint. Hence does not depend on , which implies that is uniformly distributed over if no rejection occurs.

Next, we prove that if then FactorEasy runs in time in expectation. The runtime for generating a random -regular graph on using the Gao–Wormald algorithm [9] is in expectation. By Lemma 3.2, the probability of an f-rejection or a b-rejection in each step is . By definition of and , the algorithm FactorEasy performs at most switching steps. Thus, the overall probability of any f-rejection or b-rejection is at most

which is because . It follows from this and from Corollary 3.1, that in expectation FactorEasy restarts times. It only remains to show that in each run of FactorEasy without rejection, the time complexity is . We will prove this part in Section 6. ∎

We close this section by proving the following lemma, which follows easily from Lemma 3.2. This result, which will be useful later, only requires a rather weak condition on and .

Lemma 3.3.

Assume that . Then for any ,

Proof.

We have

where is the expected value of for a random and is the expected value of for a random . The result follows by Lemma 3.2. ∎

When is no longer negligible compared to , the probability that an f-rejection or b-rejection occurs in FactorEasy before reaching is very close to 1. In this case, FactorEasy becomes very inefficient as it has to restart many times. In Section 5 we define FactorUniform, which will use 4-edge switchings instead of 3-edge switchings, giving more room for performing valid operations. In addition, various new ideas will be incorporated into the design of FactorUniform to achieve uniformity in the output and efficiency when . Since the uniform sampler FactorUniform can be treated as an extension of the approximate sampler FactorApprox, we will introduce FactorApprox first, in Section 4 below. All runtime analysis is deferred to Section 6, where the proofs of our main theorems are completed.

For the algorithms FactorUniform and FactorApprox, we restrict the host graph to be regular (specifically, -regular), to simplify the analysis. However, we believe that FactorUniform (and FactorApprox) can be modified to work for irregular host graphs .

Before moving on to these more advanced algorithms, we first complete the analysis of FactorEasy by proving Lemma 3.2.

3.2 Proof of Lemma 3.2

For the upper bound of , note that there are exactly ways to choose , and then at most ways to choose and at most ways to choose .

For the lower bound of , there are ways to choose . Given that, there are at least ways to choose such that is a black edge in , and , , and are all distinct. Then there are at least ways to choose such that is a black edge in and

(as is allowed to coincide with ). This gives at least choices of , but some of these choices are not valid: specifically, we must subtract the number of choices such that one of , or is either an edge in (either black or red), or is a red non-edge (that is, an edge in ).

  • There are at most choices such that or or is an edge in ;

  • There are at most choices such that or or is a red non-edge.

Subtracting these yields the desired lower bound for .

Next we consider . For the upper bound, there are at most ways to choose , since is an upper bound on the number of red non-edges in . Then there are at most ways to choose , at most ways to choose and at most ways to choose . This yields the required upper bound.

For the lower bound, there are exactly ways to choose , as the number of red non-edges in is exactly . Then there are exactly ways to choose and such that and are edges in (not necessarily black). Note here that is permitted. The number of ways to choose such that is a black edge in , and is at least . This gives at least choices for , but some of these choices are not valid: specifically, we must subtract the number of choices where one of , is either an edge in or a red non-edge, or one of , is a red edge.

  • There are at most choices such that or is either an edge in or a red non-edge;

  • There are at most choices such that or is a red edge.

Subtracting these gives the stated lower bound for .

Finally, the last statement of the Lemma 3.2 follows immediately from the above bounds by (3.1) and noting that . ∎

4 The approximate algorithm: FactorApprox

We now assume that is -regular, which implies that . By Lemma 2.1, the expected number of red edges in a random -factor of is asymptotic to . This leads to the following definition and corollary.

Define

(4.1)
(4.2)
Corollary 4.1.

With probability at least , a uniformly random -regular graph on contains at most red edges.

Gao and Wormald [8, 9] gave an algorithm called REG* for generating regular graphs asymptotically approximately uniformly in runtime . FactorApprox uses REG* to generate a random -regular graph on . When , the distribution of the output of REG* differs from the uniform by in total variation distance. This implies the following result, using Corollary 4.1

Corollary 4.2.

With probability at least , the output of REG* contains at most red edges.

Both FactorUniform and FactorApprox use 4-edge-switchings, which we now define. To define a 4-edge-switching from the current graph , choose a sequence of 8 vertices such that is a red edge in and , , are other edges in , and such that

  • , , , are not present in ;

  • none of is red in ;

  • no two of the eight vertices are equal except for possibly ;

  • Either none of , , and is red;
    or exactly one of , , and is red;
    or both and are red, and both and are black;
    or both and are black, and both and are red.

The sequence of vertices is said to be valid if it satisfies the above properties. Given a valid 8-tuple of vertices, the switching operation deletes the four edges , , , , and replaces them with the edges , , and , as in Figure 2. The resulting graph is denoted by . This operation is called a 4-edge-switching. We say that the 4-edge-switching is valid if it arises from a valid 8-tuple.

Under the switching, the colour of each edge or non-edge stays the same, but edges become non-edges and vice-versa. In Figure 2, a solid line indicates an edge of and a dashed line represents an edge of ; that is, a non-edge in . We label each edge (or non-edge) by the allowed colour, where ‘b’, ‘r’ and ‘b/r’ denote ‘black’, ‘red’, and ‘black or red’, respectively.

r

b/r

b/r

b

b

b/r

b/r

b

r

b/r

b/r

b

b

b/r

b/r

b

Figure 2: A 4-edge-switching

The structure of FactorApprox is similar to that of FactorEasy, except that there is no rejection after the first step.

First, FactorApprox uses REG* to repeatedly generate a random -regular graph on the vertex set until the resulting graph belongs to . Next, FactorApprox repeatedly applies random 4-edge switchings from the current graph, until a graph without red edges is produced. Finally, FactorApprox outputs this graph, which is a -factor of .

To choose a uniformly random 4-edge switching from a current graph , we can uniformly at random choose a red edge and label its end vertices by and . Then uniformly choose three edges of (repetition allowed) and label their endvertices. If the resulting 8-tuple is valid then perform the corresponding 4-edge-switching to produce the new graph . Otherwise, repeat until a valid 4-edge-switching is obtained.


Efficiency. The runtime of REG* is and by Corollary 4.2, a constant number of attempts will be sufficient to generate a random -regular graph with at most red edges in expectation. Now a given graph , consider choosing to be a uniformly random red edge of and choosing each of , , to be a uniformly random edge of (with repetition allowed). It is easy to see that with high probability, this random choice of edges , , , defines a valid 4-edge switching which reduces the number of red edges by exactly one. (The argument is very similar to the proof of Lemma 5.11, which can be found in the appendix.) Hence, the cost of time in performing one 4-edge switching is , and FactorApprox consists of performing switching steps in expectation and with high probability. Since , it follows that the runtime of FactorApprox is .


Correctness. It only remains to show that FactorApprox outputs a random -factor of whose distribution differs from the uniform by in total variation distance. We will prove this in Section 6.

5 The exactly uniform sampler: FactorUniform

In this section, our aim is to define and analyse an algorithm for uniform generation of -factors of , which is efficient for larger values of and than FactorEasy. Specifically, we assume that . As in Section 4, we assume that is -regular and define and as in (4.1), (4.2).

The analysis of FactorEasy given in Section 3.1 shows that the probability of an f-rejection or a b-rejection depends on the gap between the upper and lower bounds of and . To obtain an algorithm which is still efficient for larger values of and , we must reduce the variation of and among , for some family of switchings. We use several techniques to reduce this variation:

  • We use 4-edge-switchings instead of 3-edge switchings.

  • We will perform more careful counting than the analysis of Lemma 3.2.

  • We will occasionally allow switchings which create new red edges.

  • We will introduce other types of switchings to “boost” the probability of graphs which are otherwise not created sufficiently often. These types of switchings are called boosters.

The last two of these techniques follow the approach of [8, 9]. In particular, every switching introduced will have a type and a class. The number of switchings of type that can be performed on a given graph is denoted by , and the number of switchings of class that can be applied to other graphs to produce is denoted by . We will also need parameters and which satisfy

Further details of the structure of algorithm FactorUniform will be explained in Section 5.3.

5.1 Type I switchings

FactorUniform will mainly use the 4-edge-switching shown in Figure 2, which replaces four edges by four new edges. We will call this the Type I switching, as we will define other types of switchings for use in FactorUniform later.

Suppose that a Type I switching transforms a graph into a graph . Then the initial graph can be in different strata, depending on the colour of , , and . If these edges are all black then we say that the Type I switching is in Class A. In this case, has exactly one more red edge than . Other Type I switchings are categorised into different classes, as shown in Table 1.

class action the switching
A

r

r
B1

r

r

r

r
B2

r

r

r

r
C

r

r

r

r

r

r
Table 1: The different classes of Type I switchings

Each row of Table 1, other than the first row, defines two new classes of switching, depending on how the vertices are labelled. For example, the second row defines Classes B1+ and B1, which we refer to collectively as B1. Every class with a name ending “+” arises from using the vertex labelling shown on the left of Figure 3, while those classes ending in “” arise from using the vertex labelling shown on the right of Figure 3. For example, if (respectively, ) is red but not present in , and all the other edges and non-edges involved in the switching, except for , are black, then this switching is in Class B1+ (respectively, B1) and both and belong to .

Figure 3: The vertex labellings for “” and “” classes, respectively

Next we bound the number of Type I switchings which can be performed in an arbitrary . Define

(5.1)

The proof of the following lemma is similar to that of Lemma 3.2, and is deferred to the appendix.

Lemma 5.1.

Suppose that . Then for any ,

(5.2)
(5.3)
Remark 5.2.

If we did not allow the creation of structures in classes B1, B2 and C then the f-rejection probability would be too big. For instance, suppose that we did not allow Class B1+ (and that the other Class B and C switchings are permitted). Then for most graphs in , the typical number of forward switchings would be approximately

(This is approximately with the term subtracted, which is the typical number of 8-tuples corresponding to a Class B1+ switching.) However, consider an extreme “worst case”, when and divides , and the -factor is composed of two -regular graphs, one with only red edges and the other with only black edges. Then there are no red edges in that are incident with a red dashed edge, so no Class B1+ switchings need to be ruled out (as none are possible). In this case, the number of forward switchings in is

Hence, differs from by for most graphs , causing an f-rejection probability of in a single step. But then the overall rejection probability will be too big, since there will be up to steps, and may not be under the conditions of Theorem 1.2. To reduce the probability of an f-rejection we must allow these Class B1+ switchings to proceed (and Class B1 switchings too, by symmetry). Similar arguments explain the introduction of Classes B2 and C.

5.2 New switching types and counting inverse switchings

For each we count the inverse switchings of Class .


5.2.1 Class A

Class A switchings are all of Type I. In this section we will obtain a lower bound for and an upper bound for the average of over all . The following lemma will be useful.

Lemma 5.3.

Assume that and . Let be a -factor chosen uniformly at random from . Then the expected number of red 2-paths in is and the expected number of pairs of red edges in such that is either a red edge in or a red dashed edge is .

Proof.

Let be a red 2-path in . We now bound the probability that is contained in a random . Let denote the set of graphs in which contain and let denote the set of graphs in which do not contain any of and . Consider the switching as shown in Figure 4.

Figure 4: A switching for red 2-paths

Such a switching switches a graph in to . It is easy to see that the number of forward switchings is at least , whereas the number of inverse switchings is at most . Hence,