    # The giant component of the directed configuration model revisited

We prove a law of large numbers for the order and size of the largest strongly connected component in the directed configuration model. Our result extends previous work by Cooper and Frieze.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction and notations

An scc (strongly connected component) in a digraph (directed graph) is a maximal sub-digraph in which there exists a directed path from every node to every other node. In this short note, we analyse the size of the giant component, i.e., the largest scc, in the directed configuration model. This is a continuation of our previous work , which studied the diameter of the model.

We briefly introduce the model and our assumptions. For further discussions and references, see . Let be a set of nodes. Let be a bi-degree sequence with . The directed configuration model, , is the random directed multigraph on generated by giving in half-edges (heads) and out half-edges (tails) to node , and then pairing the heads and tails uniformly at random.

Let be the degrees (number of tails and heads) of a uniform random node. Let be the number of in . Let . Consider a sequence of bi-degree sequences . Throughout the paper, we will assume the following condition is satisfied,

###### Condition 1.1.

There exists a discrete probability distribution

on with such that

1. converges to in distribution: for every

2. converges to in expectation and the expectation is finite:

 limn→∞E[D−n]=limn→∞E[D+n]=E[D−]=E[D+]\eqqcolonλ∈(0,∞); (1.1)
3. converges to

in second moment and they are finite: for

, ,

 limn→∞E[(D−n)i(D+n)j]=E[(D−)i(D+)j]<∞ (1.2)

To state the main result, some parameters of are needed. Let

 ν\coloneqqE[D−D+]λ<∞, (1.3)

where the inequality follows from conditions (ii) and (iii). Let be the bivariate generating function of . Let and be the survival probabilities of the branching processes with offspring distributions which have generating functions and respectively. In other words, and are, respectively, the smallest positive solutions to the equations

 z=1λ∂f∂w(z,1),w=1λ∂f∂z(1,w). (1.4)

Let be the largest scc in . (If there is more than one such scc, we choose an arbitrary one among them as .) Let be the number of nodes in . Let be the number of edges in . Our main result is the following theorem on :

###### Theorem 1.2.

Suppose that satisfies 1.1. If , then

 v(Gn)n→η<∞, (1.5) e(Gn)n→λs−s+<∞, (1.6)

in expectation, in second moment and in probability, where

 η\coloneqq∑i,j≥0λi,j(1−ρi−)(1−ρj+)=1+f(ρ−,ρ+)−f(ρ−,1)−f(1,ρ+). (1.7)

If , then for all with

 v(Gn)an→0, (1.8)

in expectation and in probability.

###### Remark 1.3.

Under 1.1, the probability that is simple is bounded away from , see [2, 10]. Thus 1.2 holds for a uniform random simple digraph with degree sequence .

The two cases and are often referred to as subcritical and supercritical regimes. As shown in , in the supercritical case, and . In other words, whp (with high probability), the size of the largest scc is bounded in the first case and linear in the second one.

Equation (1.5) in 1.2 was first proved by Cooper and Frieze  under stronger conditions including , and . Graf [9, Theorem 4.1] extended the existence of a linear order scc provided that converges uniformly and . 1.1 only implies that , see [4, Corollary 2.4]. In the subcritical case, the results in [5, 9] only show that whp the largest scc has order instead of .

The paper is organized as follows: In Section 2, we study the probability of certain events for branching processes. In Section 3, we recall a graph exploration process defined in  and extend it. Section 4 studies the probability that a set of half-edges to reach a large number of other half-edges. Section 5 shows that the number of nodes which can reach and can be reached from many nodes is concentrated around its mean. Then in Section 6 we show that these nodes form the giant. Finally in Section 7 we give an application of 1.2 to binomial random digraphs.

## 2 Branching processes

Let

be a random variable on

and let be iid (independent and identically distributed) copies of . Let be the generating function of and . Let be a branching process with offspring distribution . If for all , then the branching process is said to survive; otherwise, it is said to become extinct. The following are well-known in the branching process theory (see, e.g., [14, Theorem 3.1] and [1, Theorem I.10.3], respectively):

###### Lemma 2.1.

Let be the smallest nonnegative solution of . The survival probability is

 sξ\coloneqqP{∩t≥1[Xt>0]}=1−ρξ. (2.1)

Moreover, if and only if .

###### Lemma 2.2.

Assume that . Then there exists a sequence for which , such that , where is a non-negative random variable for which and which is continuously distributed on .

The main result of this section is the following:

###### Lemma 2.3.

Let be a branching process with offspring distribution with . Let

 Tω\coloneqqinf{t:Xt≥ω}. (2.2)

Then for all and as ,

 P{Tω≤(1+ε)logνξω}→sξ. (2.3)
###### Proof.

Let . It suffices to show that . We split this probability into

 P{Tω>t1}=P{[Tω>t1]∩[Xt1=0]}+P{[Tω>t1]∩[Xt1∈(0,ω)]}\eqqcolonI1+I2. (2.4)

By Theorem 3.4 of , there exist constants and (both depending only on ) such that for all ,

 I2=P{∩t1i=0Xi∈(0,ω)}≤C^ν(1+ε)logνξω−(1+o(1))logνξω−1≤C^ν(ε/2)logνξω=o(1). (2.5)

Let . Let denote the event that becomes extinct, i.e., for some . If , then and we are done. Thus we can assume that . Then

 I1≤P{[Yt1≤(1+t1)ω]∩[Xt1=0]}≤P{Yt1≤(1+t1)ω∣∣E}P{E}→P{E}=qξ, (2.6)

since a branching process conditioned on becoming extinct has a finite total progeny.

For a lower bound of , note that implies . Thus,

 I1≥P{[Yt1<ω]∩[Xt1=0]}=P{Yt1<ω}−P{[Yt1<ω]∩[Xt1>0]}. (2.7)

Note that

 P{Yt1<ω}≥P{Yt1<ω∣∣E}P{E}→P{E}=qξ. (2.8)

By Theorem 6 of , there exists a sequence with such that for all ,

 (2.9)

where is a non-negative random variable for which and which has continuous distribution on . Therefore, for all ,

 (2.10)

as . Since is arbitrary, we have

 P{Yt1<ω∣∣Xt1>0}→0. (2.11)

Putting (2.11) and (2.8) into (2.7) gives the desired lower bound. ∎

2.3 can be generalized to multiple iid branching processes as follows:

###### Corollary 2.4.

Let be independent branching processes with offspring distribution . Assume that . Let

 T(x)ω\coloneqqinf{t:x∑i=1Xi,t≥ω}. (2.12)

Then for all and as ,

 P{T(x)ω≤(1+ε)logνξω}→1−(1−sξ)x. (2.13)
###### Proof.

Let . Let . By 2.3

 P{T(x)ω>t1}≤P{∩xi=1[Ti,ω>t1]}=x∏i=1P{Ti,ω>t1}→(1−sξ)x, (2.14)

and

 P{T(x)ω>t1}≥P{∩xi=1[Ti,ωx>t1]}=x∏i=1P{Ti,ωx>t1}→(1−sξ)x.\qed (2.15)

## 3 Exploring the graph

We extend the Breadth First Search (BFS) graph exploration process of defined in .

For , let be the set of heads/tails incident to the nodes in . Let . For , let be the set of nodes incident to . Let be a partial pairing of half edges in . Let be the set of heads/tails which are paired in . Let . Let be the unpaired heads/tails which are incident to . Let denote the event that is part of . We will explore the graph conditioning on .

We start from an arbitrary set of unpaired tails. In this process, we create random pairings of half-edges one by one and keep each half-edge in exactly one of the four states — active, paired, fatal or undiscovered. Let , , and denote the set of heads/tails in the four states respectively after the -th pairing of half-edges. Initially, let

 A+0=X+,A−0=E−(V(X+)),P±0=P±(H),F±0=F±(H),U±0=E±∖(A±0∪P±0∪F±0). (3.1)

Then set and proceed as follows:

1. Let be one of the tails which became active earliest in .

2. Pair with a head chosen uniformly at random from . Let .

3. If , then terminate; if , then ; and if , then where .

4. If terminate; otherwise, , , and go to (i).

Let be a forest with isolated nodes corresponding to . Given , is constructed as follows: if , then construct from by adding child nodes to the node representing , each of which representing a tail in ; otherwise, let . While is an unlabelled forest, its nodes correspond to the tails in . So we can assign a label paired or active to each node of .

Given half-edges and , the distance is the length of the shortest path from to which starts with the edge containing and ends with the edge containing .

If is the last step where a tail at distance from is paired, then satisfies: (i) the height is ; (ii) the set of actives nodes is the -th level. We call a rooted forest incomplete if it satisfies (i)-(ii). We let be the number of paired nodes in .

### 3.1 Size biased distributions

We recall some notation in . The in- and out-size biased distributions of and are defined

 P{(Dn)in=(k−1,ℓ)} =knk,ℓmn,P{(Dn)out=(k,ℓ−1)}=ℓnk,ℓmn, (3.2) P{Din=(k−1,ℓ)} =kλk,ℓλ,P{Dout=(k,ℓ−1)}=ℓλk,ℓλ. (3.3)

Then, by (i) of 1.1, and , and by (iii) of 1.1,

 limn→∞E[(Dn)+in]=limn→∞E[(Dn)−out]=E[D+in]=E[D−out]=E[D+D−]λ=ν. (3.4)

Let , , and be the survival probabilities of the branching processes with distribution , , and respectively. Then as we have shown in , .

### 3.2 Coupling with branching processes

Consider the probability distribution which satisfies for all ,

 P{Qn=ℓ}=qn,ℓ\coloneqq∑k≥1knk,ℓmn. (3.5)

In [4, Section 3], it has been shown that in distribution and in expectation. In particular, by (3.4) . Also in , we showed that the exploration process starting from one tail can be approximated by a branching process with offspring distribution . Similarly, the extended exploration process starting from can be approximated by independent branching processes with offspring distribution .

For , consider the distributions and defined by

 P{Q↓n=ℓ} =q↓n,ℓ\coloneqq{c↓qn,ℓif qn,ℓ≥n−2βand ℓ≤nβ0otherwise (3.6) P{Q↑n=ℓ} =q↑n,ℓ\coloneqq{c↑qn,ℓℓ≥1c↑qn,0+n−1/2+2βℓ=0 (3.7)

where and are normalising constants.

Let be independent Galton-Watson trees with offspring distribution . Let be an incomplete forest. Let denote that for every , is a root subtree of and all paired nodes of have the same degree in .

The following lemma is a straightforward extension of [4, Lemma 5.3] and we omit its proof:

###### Lemma 3.1.

Let and let be a partial pairing with . Let with . For every incomplete forest with , we have

 (1+o(1))P{GW(x)Q↓n(β)≅F}≤P{FX+(p(F))=F|EH}≤(1+o(1))P{GW(x)Q↑n(β)≅F}. (3.8)

## 4 Expansion probability

Let and be the sets of heads/tails at distance and at most from respectively. From now on, let

 ω\coloneqqlog6n,t0\coloneqqlogνω. (4.1)

Let be the expansion time of defined as

 tω(X±)\coloneqqinf{t≥1:\absN±t(X±)≥ω}. (4.2)

For brevity, we write .

Given a partial pairing of and , we consider the following two events:

 A1(X±,ε)\coloneqq[tω(X±)≤(1+ε)t0]. (4.3) A2(X±,H)\coloneqq[N≤ω(X±)∩F±(H)=∅].

The first lemma in this section shows that the probability that both these events happen is close to the survival probability of a branching process.

###### Lemma 4.1.

Assume that . Fix , and . Then uniformly for all choices of partial pairing and with , , as ,

 P{A1(X±,ε)∩A2(X±,H)∣∣EH}=(1+o(1))(1−ρx±). (4.4)
###### Proof.

Let be the class of incomplete forests with trees, height and such that only the last level has at least nodes. Let . For and , we have . Let . Let be the sizes of the -th generation of iid branching processes with offspring distribution and let be the survival probability of each one. Since in distribution, we have .

Let . By 2.4 and 3.1, the LHS of (4.4) is

 t1∑t=1⌊xωt⌋∑j=t−1∑F∈Fx,t,ωp(F)=jP{FX+(x)=F|EH} ≤(1+o(1))t1∑t=1⌊xωt⌋∑j=t−1∑F∈Fx,t,ωp(F)=jP{GWQ↑n(β)≅F} (4.5) =(1+o(1))P{T↑ω≤t1} =(1+o(1))(1−(1−s↑+n)x) =(1+o(1))(1−ρx+),

where we used that implies . The lower bound follows from a similar argument. ∎

Our next lemma shows that when is small, and are unlikely to be too close. We omit the proof since it follows from an easy adaptation of the proof in [4, Proposition 7.2].

###### Lemma 4.2.

Assume that . Fix and . Then uniformly for all choices of partial pairing and with and , we have

 P{dist(X+,X−)≤(12−ε)logνn∣∣∣EH}=o(n−ε/2). (4.6)

The previous lemma allows us to remove in 4.1.

###### Lemma 4.3.

Assume that . Fix and . Then uniformly for all choices of partial pairing and with , , we have, as ,

 P{A1(X±,ε)∣∣EH} =(1+o(1))(1−ρx±±), (4.7) P{A1(X+,ε)∩A1(X−,ε)∣∣EH} =(1+o(1))(1−ρx−−)(1−ρx++). (4.8)
###### Proof.

We will prove it for ; a similar argument works for . Let

 E1=A1(X+,ε),E2=A2(X+,H),E3=A1(X−,ε). (4.9)

Note that the event happens if and only if .

By 4.1, the LHS of (4.7) equals

 P{E1|EH} =P{E1∩E2|EH}+P{E1∩Ec2∣∣EH} (4.10) =(1+o(1))(1−ρx++)+P{E1∩Ec2∣∣EH}.

Since , by [4, Lemma 2.2] we have . By 4.2, for ,

 P{E1∩Ec2∣∣EH} ≤P{dist(X+,F+(H))≤4t0∣∣EH} (4.11) ≤P{dist(X+,F+(H))≤(12−δ)logn∣∣∣EH}=o(1).

Let be the set of all possible partial pairings in such that happens. Then implies that . Using 4.1 again, we have

 P{E1∩E3|EH} =∑H′∈HP{E3|EH∪H′}P{EH∪H′|EH} (4.12) =(1+o(1))(1−ρx−−)P{E1|EH} =(1+o(1))(1−ρx−−)(1−ρx++).\qed (4.13)

Unsurprisingly, 4.3 can be extended to a fixed number of pairs of head-sets and tail-sets:

###### Lemma 4.4.

Assume that . Fix and . Then uniformly for all disjoint sets of tails and disjoint sets of heads with for , we have, as ,

 P{∩ij=1[A1(X+,ε)∩A1(X−,ε)]}=(1+o(1))i∏j=1(1−ρx