Percolation Threshold Results on - Graphs: an Empirical Process Approach

12/05/2017
by   Michael Kane, et al.
Yale University
0

In this paper we define a directed percolation over Erdos-Renyi graphs and derive weak limit results for the resulting stochastic process.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

01/09/2021

On the Seidel spectrum of threshold graphs

In this paper, we analyse spectral properties of Seidel matrix (denoted ...
08/28/2018

On the Continuous Limit of Weak GARCH

We prove that the symmetric weak GARCH limit is a geometric mean-reverti...
08/02/2019

A Notion of Entropy for Stochastic Processes on Marked Rooted Graphs

In this document, we introduce a notion of entropy for stochastic proces...
07/01/2019

On Characterizations for Subclasses of Directed Co-Graphs

Undirected co-graphs are those graphs which can be generated from the si...
03/17/2021

Locality of Random Digraphs on Expanders

We study random digraphs on sequences of expanders with bounded average ...
01/27/2019

Bipartitioning of directed and mixed random graphs

We show that an intricate relation of cluster properties and optimal bip...
03/31/2022

Snapshot Visualization of Complex Graphs with Force-Directed Algorithms

Force-directed algorithms are widely used for visualizing graphs. Howeve...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Random graphs and discrete random processes provide a general approach to discovering properties and characteristics of random graphs and randomized algorithms. The approach generally works by defining an algorithm on a random graph or a randomized algorithm. Then, expected changes for each step of the process are used to propose a limiting differential equation and a large deviation theorem is used to show that the process and the differential equation are close in some sense. In this way a connection is established between the resulting process’s stochastic behavior and the dynamics of a deterministic, asymptotic approximation using a differential equation. This approach is generally referred to as stochastic approximation

and provides a powerful tool for understanding the asymptotic behavior of a large class of processes defined on random graphs. However, little work has been done in the area of random graph research to investigate the weak limit behavior of these processes before the asymptotic behavior overwhelms the random component of the process. This context is particularly relevant to researchers studying news propagation in social networks, sensor networks, and epidemiological outbreaks. In each of these applications, investigators may deal graphs containing tens to hundreds of vertices and be interested not only in expected behavior over time but also error estimates.

This paper investigates the connectivity of graphs, with emphasis on Erdős-Rényi graphs, near the percolation threshold when the number of vertices is not asymptotically large. More precisely, we define a simple algorithm for simulating directed percolations on a graph in Section 2. Section 3

provides an overview of the two fundamental techniques required for our investigation: stochastic approximation and the functional martingale central limit theorem. In Section

4

, these tools are applied to the directed percolation algorithm to show that the behavior of the process converges to an ordinary differential equation plus a stretched-out brownian motion. This result allows us to re-examine many of the classical random graph results

[2] involving the evolution of random graphs near the percolation threshold. Furthermore, because the process can be modeled as a function of a stretched out brownian-motion we can draw on the stochastic calculus literature to derive new results for random graphs. For example, in Section 5 this new representation is used to find the percolation threshold for the graph by deriving the distribution of the stopping time for the algorithm to percolate over the largest component.

2 The Directed Percolation Algorithm

The percolation algorithm investigated in this paper is defined in Algorithm 1. The algorithm works on a graph with all vertices labelled “not visited”. At time zero one vertex is one labelled “visited not transmitted”. The algorithm proceeds by selecting one vertex labelled “visited not transmitted”. This vertex is labeled “visited”. The vertexes neighbors that are labelled “not visited” are relabelled “visited no transmitted”. If the algorithm progresses to a point where all vertices are marked either“not visited” or “visited transmitted”, then the algorithm is reseeded by selecting a binomial number of vertices labelled “not visited” and “visited not transmitted”. The algorithm continues until all vertices are marked “visited transmitted”.

let be graph with vertices
label all vertices in ‘‘not visited’’
pick one vertex uniformly at random and label it ‘‘visited not transmitted’’
while not all vertices are labelled “visited transmitted” do
       let be the set of vertices labelled ‘‘not visited’’
       let be the set of vertices labelled ‘‘visited not transmitted’’
       let be the set of vertices labelled ‘‘visited transmitted’’
       if  then
             pick a vertex uniformly at random from
             label ‘‘visited transmitted’’
             label ‘‘visited not transmitted’’
       end if
      else
             draw
             pick vertices uniformly at random and label them ‘‘visited not transmitted’’
       end if
      
end while
Algorithm 1 The directed percolation algorithm

One important characteristic of the algorithm is that edges do not need to be revealed until the algorithm needs to relabel the “not visited” vertices adjacent to the selected “visited not transmitted” vertex. This scenario is referred to as the method of deferred decision[7] and for Erdős-Rényi graphs, it induces a binomial conditional distribution on the number of vertices whose label changes from “not visited” to “visited not transmitted” at each step of the algorithm.

Theorem 2.1.

When the percolation algorithm described in Algorithm 1 is run on an Erdős-Rényi graph with vertices then at iteration with the number of vertices going from the “not visited” to the ”visited not transmitted” labelling in step is distributed as where

is the probability any two vertices are connected.

Proof.

Let be the set of vertices labelled “not visited” at iteration . If there is at least one vertex marked “visited not transmitted” then one of those vertices will be selected for transmission. The edges between that vertex and its adjacent “not visited” vertices are unknown. However, the probability that it is connected to any one of the “not visited” vertices is and therefore the number of “not visited” vertices it is connected to, which is the same as the number of new vertices that will be labelled “visited not transmitted” in the next step of the algorithm, is distributed . If, on the other hand, there are no vertices marked “visited not transmitted” then, by definition of the algorithm, a number of vertices labelled “not visited” will be labelled “visited not transmitted” in the next step. ∎

The proof shows that at any step the number of new vertices that will be labelled “visited not transmitted” at is a binomial number depending only on the current number of “not visited” vertices and the connection probability. The aggregate number of vertices labelled “visited not transmitted” and “visited transmitted” is strictly increasing based on this distribution.

The percolation algorithm on an Erdős-Rényi graph can be recast as an urn process with one urn holding balls corresponding to vertices labelled “not visited” and another holding balls corresponding to vertices labelled either “visited not transmitted” or “visited transmitted”. Initially, all balls are contained in the “not visited” urn. Let be the number of balls in the “not visited” urn at time with then at each step balls are drawn from the “not visited” urn and placed in the “visited” urn. This urn process is stochastically equivalent to the percolation process. A formal definition for the urn algorithm is given in Algorithm 2.

consider two urns labelled ‘‘not visited’’ and ‘‘visited’’
place ball into the ‘‘not visited’’ urn
while there are balls in the “not visited” urn do
       let be the number of balls in the ‘‘not visited’’ urn
       draw
       move balls from the ’’not visited’’ urn to the ‘‘visited’’ urn
      
end while
Algorithm 2 The urn model equivalent of the directed percolation algorithm

The urn model process provides a conceptually simpler tool for investigating the behavior of the directed percolation process. It also provides a means for investigating the behavior of the algorithm near the percolation threshold through the following theorem.

Theorem 2.2.

Consider the urn model process. The event were, at time , the number of balls in the “visited” urn is less than than , is equivalent to exhausting the component where the algorithm started. That is, all vertices in the component are labelled “visited.”

Proof.

Consider the directed percolation process on a graph with size greater at least two. At step zero one a “seed” vertex is selected. At the beginning of step one the seed vertex is chosen for transmission. If it has no neighbors, then the first component, which consisted of the seed vertex only is exhausted. Otherwise, without loss of generality, assume that there is one adjacent vertex, labelled . The seed vertex is no longer considered and is labelled “visited not transmitted,” which is equivalent to moving one ball into the “visited” urn. Once again if has no neighbors then, at time step two, the number of transmitted vertices is 1 since the seed vertex is not included and no new vertices are visited. In this case, at time step two corresponding to the component consisting of the seed vertex and being visited. This process continues with newly visited vertices corresponding to moving balls to the “visited” urn. The process stops when the graph component is exhausted, which occurs when the total number of visited vertices is less than the time step. ∎

Figure 1: Visualizing the empirical distribution of the number of balls in urn 1 over 100 runs near the percolation threshold

Figure 1 shows the number of balls in urn 1 when the urn process described in Algorithm 2 was simulated for 100 runs with and . A diagonal line is also shown and any points to the right of the diagonal line correspond to simulated process whose corresponding percolation process failed to spread to all vertices in the graph. For this simulation seven of the 100 runs failed.

The figure provides two important insights in understanding the behavior of the process. First, the process slope is steep compared to the diagonal at the beginning of the process. For the urn process, the number of balls in urn 1 is large resulting in a large number of balls moving to urn 2 at each time step. As the process continues there are fewer balls in urn 1 and as a result fewer balls are moved to urn 2 at each time step and the slope decreases. For the graph process, this corresponds to there being a large number of neighbors for each of the “visited not transmitted” vertices. Second, the process variance increases while the slope is steep compared to the diagonal line and concentrates as the process levels off. By definition, the number of balls in urn 2 cannot be bigger than

. Each of the processes approach quickly early in the process and then more slowly as the process nears the stopped state.

Further simulation results show that as increases the relative variation in the process decreases. That is, the process concentrates on its expected value at each time step. These expected values over time can be approximated using a differential equation. The next section provides the techniques for understanding this concentration phenomena as well as for finding the corresponding differential equation.

3 Overview of the Method

3.1 Stochastic Approximation

3.1.1 Background

As Wormald [15] points out, “This idea of approximation has existed in connection with continuous processes… essentially since the invention of differential equations by Newton for approximation of the motion of bodies in mechanics.” However, the term stochastic approximation was originally coined by [12]. Their paper presented a method for finding the root of a monotone function under noisy conditions. Kurtz [8] developed this idea further. By imposing appropriate bounds on the difference between steps of the process he showed that a reparameterized version of the process converges in distribution to a differential equation. This area of research has remained active with Darling and Norris [1] publishing a recent paper providing new conditions for the convergence of stochastic processes to an ODE based on Grönwall’s inequality [4] along examples of random processes that lend themselves to thes results.

Stochastic approximation techniques have also been applied to random graph. Wormald [16] uses techniques similar to those provided by Kurtz to show that the degree of the vertices in a random graph, where the maximum degree is bounded, converge as the total number of vertices gets large. Since then Wormald provided new results [15] handling the case where processes are not strictly Markovian. The paper also provided a survey of some graph processes that are amenable to the stochastic approximation approach including the random greedy matching algorithm presented in [6] and the degree bounded graph process which Rucinski and Wormald [13] used to answer a question originally posed by Erdős concerning the asymptotic behavior of a degree bounded graph. Readers interested in an further applications of stochastic approximation to examine processes on random graphs are encouraged to consult Wormald or the recent survey by Pemantle [10].

3.1.2 Overview

Consider a process where is the index set . Assume that the behavior of is determined by

(3.1)

where

is a random variable adapted to the natural filtration of

up to time , which will be denoted .

The urn process takes values in the integers from zero to and is defined over all non-negative integers. To derive asymptotic results it the process is reparameterized, normalizing by over both the domain and the range. Furthermore, this reparameterized process is defined to be cadlag so that its domain and range take values in the reals. Let , then the new process can be defined by

The reparameterized version process defined in Equation 3.1 can then be written as

or, for notational simplicity

If then the reparameterized process can be further re-written as

where is a centered, martingale increment. Now, let be a deterministic analog of the process with and

(3.2)

The difference between and at any value of over the domain can be written as

When the difference is small, the difference between the process and the deterministic analogue is the sum of the martingale increments.

(3.3)

If the sum of the martingale increments converges to zero and can be approximated arbitrarily well by a differential equation, then the reparameterized process converges to the differential equation asymptotically.

3.2 Functional Martingal Central Limit Theorem

3.2.1 Background

3.2.2 Overview

In the classic stochastic approximation literature it is assumed that the term in Equation 3.3 is asymptotically zero. However, [5] show that certain martingale increment processes, such as this one, which are defined over cadlag sample paths, converges to a stretched-out brownian motion. That is, a brownian motion with strictly increasing transformation to the time scale: . Sufficient conditions for convergence to a stretched-out Brownian motion from [11] are given here for reference.

Theorem 3.1.

Let be a sequence of martingales adapted to its natural filtration, and . Let have conditional variance process . Let be a continuous, increasing function on with . Let be the maximum jump in a sample path

(3.4)

Sufficient conditions for convergence in distribution of to a stretched-out brownian motion are:

  1. in probability

  2. in probability for each fixed

  3. for each fixed as

4 Applying Stochastic Approximation and the Functional Central Limit Theorem to the Urn Process

Returning to the urn process, let be the total number of balls in urn 1 and urn 2 at any time, let the probability any single ball in urn 1 is moved to urn 2, and let let be the number of balls in urn 1 at time . Then

(4.1)

where . Let be the reparameterized process with and

(4.2)

where is an -measurable random variable with distribution . Equation 4.2 can then be written as

(4.3)

where and is a martingale increment.

4.1 Approximating the Process with a Differential Equation

Theorem 4.1.

If the sum of the martingale increments up to time can be bound by or less then the process in Equation 4.3 can be written as

Proof.

By definition . Therefore can be written as:

Likewise

From this the process can be written as

(4.4)

The summation in 4.4 is a martingale. The absolute value of this summation is bound by the sum of the absolute values of each of the summands, which is a submartingale process. Therefore, the supremum of the martingale can be bound by

(4.5)

where 4.5 follows by the Doob inequality. The expected maximum of the martingale increments is converging to zero at a rate of .

The difference between and is of order .

To extend this to the reals it is sufficient to show that for any increment in the process, the difference between and is .

4.2 Applying the Functional Martingale Central Limit Theorem

According to Equation 4.2 each increment of the urn process is a function of the last state of the process minus a martingale increment. Theorem 4.1 shows that if these martingale increments are not too big then, in the limit, a reparameterized version of the process will converge to a differential equation. In this section it is shown that the martingale process converges to a stretched-out brownian motion whose variance is decreasing in .

Consider the process in Equation 4.1. The next urn count is equal to the current urn count minus a binomial number of balls. The binomial number is determined by the current number of balls in urn 1 and the probability that a ball is moved from urn 1 to urn 2. This time, let and decompose the process in the following way

Call define a new process where

(4.6)

for integers . This process is a martingale with strictly increasing variance.

Theorem 4.2.

The process defined in Equation 4.6 converges to a stretched-out brownian motion with variance at time .

Proof.

Condition 1 of Theorem 3.1 is satisfied by definition of the process . Condition 2 can be derived using a conditioning argument.

And condition 3 can be derived by realizing that the largest jump is bound by the sum of all jumps in the process up to time . Let , then

which approaches zero as . ∎

Corollary 4.3.

If is the martingale increment from Equation 4.3 then

(4.7)

where is a stretched-out brownian motion with variance process .

Proof.

Recall that converges to a . The martingale process also converges to the stretched-out brownian motion.

The result follows by realizing that the variance process is half an order of magnitude smaller than

and as a result, so is its standard deviation. ∎

5 A General Boundary-Crossing Result for the Directed Percolation Algorithm

Theorem 5.1.

Let be the approximation of the process of interest:

(5.1)

where and is a stretched-out brownian motion with variance parameter . Let with . Let be the first time for . The density of is

(5.2)
Proof.

Equation 5.1 can be expressed as

(5.3)

where and can be thought of as the remainder of the deterministic portion of and is the remainder of the random portion.

First, show that by realizing that the third order term of the Taylor series expansion of the deterministic part is

since the difference .

Next show that

by substituting

and showing that the exponential variance terms converge to the original

Finally, set to . Substitute into Equation 5.3, note that by assumption and as a consequence of the choice for .

(5.4)

The result shows that is distributed as normal and is centered at . The proof follows by realizing that the hitting time has density equal to that of . ∎

6 Applications: Finding the Probability that the Giant Component has been Exhausted

The results from Section 5 can be used to get distribution of the time when the giant component is exhausted in the directed percolation algorithm defined in Section 2.

Equation 5.1 shows that if and is large then the process is approximately equal to it’s stochastic approximation, . Theorem 2.2 showed that the first time the number of “visited” vertices is less than the time step corresponds to exhausting the first component the algorithm percolates over. From these two results it follows that the first component is exhausted when

when is big. The result also shows that process is asymptotically subcritical when . Since the only solution to is . It should be noted that this result is consistent with the result from the original Erdős-Rényi paper [2] where, asymptotically, the ratio of the largest component to the total number of vertices in a component is when and when .

When is not too large, the results from the previous section give the distribution for the first time the process crosses any pre-determined horizontal line. This result can be used to find the probability that the giant component has been exhausted in the directed percolation algorithm 1 at any time step. For a fixed , this is accomplished by numerically solving for in Equation 6, calculating , and use Equation 5.2 to get the distribution.

Figure 2: The solution for for the function .

7 Conclusion

This paper presents weak limit results for a directed percolation algorithm on Erdős-Rényi graphs. This process concentrates on an ordinary differential equation and it is shown that, in a pre-asymptotic setting, the process can be approximated by it’s asymptotic ODE plus a stretched-out brownian motion. While many of the results presented are specific to the choice of the algorithm and the type of random graph, the underlying approach is more general. The derived results only require a Lipschitz condition on the conditional increments of the process along with control over the variance of the process. As a result, the techniques used can be seen as a general approach to uncovering the characteristics of graphs, modeling outbreaks, studying new propagation in social networks, etc. when the total number of vertices is relatively smaller.

References

  • [1] Darling R. W. R. and Norris J. R.

    (2008). Differential equation approximations for Markov Chains.

    Probability Surveys 5 37–79.
  • [2] Erdős P. and Rényi A. (1960). On the Evolution of Random Graphs. Publication of the Mathematical Institute of the Hungarian Academy of Sciences 17–61.
  • [3] Friedgut E. and Kalai G. (1996). Every Monotone Graph Property has a Sharp Threshold. Proceedings of the American Mathematical Society 124(10) 2993–3002.
  • [4] Grönwall T. H. (1919). Note on the derivative with respect to a parameter of the solution of a system of differential equations. The Annals of Mathematics 20(4) 292–296.
  • [5] Hall P. and Heyde C. C. (1980). Martingale limit theory and its application. Academic Press. New York.
  • [6] Karp R. M. and Sipser M. (1981) Maximum matchings in sparse random graphs Proceedings fo teh Twenty Second Annual IEEE Symposium on Foundations of Computing 364–375.
  • [7] Knuth D. E. and Motwani R. and Pittel B. (1990). Stable Husbands Random Structures and Algorithms 1 1–14.
  • [8] Kurtz T. G. (1970) Solutions of Ordinary Differential Equations as Limits of Pure Markov Jump Processes. Journal of Applied Probability 7(1) 49–58.
  • [9] Kurtz T. G. and Protter P. (1991). Weak Limit Theorems for Stochastic Integrals and Stochastic Differential Equations. The Annals of Probability 19(3) 1035–1070.
  • [10] Pemantle R. (2007). A survey of random processes with reinforcement. Probability Surveys 4 1–79.
  • [11] Pollard D. (1984). Convergence of Stochastic Processes. Springer. New York.
  • [12] Robbins H. and Monro S. (1951). A Stochastic Approximation Method. The Annals of Mathematical Statistics 2 400–407.
  • [13] Ruciński A. and Wormald N. C. (1992). Random graph processes with degree restrictions. Combinatorics, Probability and Computing 1 169–180.
  • [14] Steele J. M. (2001). Stochastic Calculus and Financial Applications. Springer. New York.
  • [15] Wormald N. C. (1999). The differential equation method for random graph processes and greedy algorithms. Lectures on Approximation and Randomized Algorithms (M. Karonski and H. J. Prömel, eds) 73–155.
  • [16] Wormald N. C. (1995). Differential Equations for Random Processes and Random Graphs The Annals of Applied Probability 5(4) 1217–1235.