An Improved Algorithm for Incremental Cycle Detection and Topological Ordering in Sparse Graphs

10/08/2018 ∙ by Sayan Bhattacharya, et al. ∙ Microsoft University of Warwick 0

We consider the problem of incremental cycle detection and topological ordering in a directed graph G = (V, E) with |V| = n nodes. In this setting, initially the edge-set E of the graph is empty. Subsequently, at each time-step an edge gets inserted into G. After every edge-insertion, we have to report if the current graph contains a cycle, and as long as the graph remains acyclic, we have to maintain a topological ordering of the node-set V. Let m be the total number of edges that get inserted into G. We present a randomized algorithm for this problem with Õ(m^4/3) total expected update time.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Consider an incremental directed graph with nodes. The edge-set is empty in the beginning. Subsequently, at each time step an edge gets inserted into . After each such update (edge insertion), we have to report if the current graph contains a cycle, and as long as the graph remains acyclic, we have to maintain a topological ordering in . The time taken to report the answer after an edge insertion is called the update time. We want to design an incremental algorithm for this problem with small total update time, which is defined as the sum of the update times over all the edge insertions. Recall that in the static setting there is an algorithm for cycle detection and topological ordering that runs in linear time. Thus, in the incremental setting, a naive approach would be to run this static algorithm from scratch after every edge-insertion in . Let be the number of edges in the final graph. Then the naive incremental algorithm will have a total update time of . In contrast, we get the following result.

Theorem 1.1.

There is a randomized algorithm for incremental cycle detection with expected total update time of .

1.1 Perspective

Cycle detection and topological ordering in directed graphs are fundamental, textbook problems. It is natural to ask what happens to the complexity of these problems when the input graph changes with time via a sequence of edge insertions. It comes as no surprise, therefore, that a long and influential line of work in the dynamic algorithms community, spanning over a couple of decades, have focussed on this question [BernsteinC18, HaeuplerKMST08, HaeuplerKMST12, BenderFGT16, BenderFG09, CohenFKR13, AjwaniF10, AjwaniFM08, KatrielB06, LiuC07, Marchetti-SpaccamelaNR96, PearceK06].

Note that the problem is trivial in the offline setting. Here, we get an empty graph and a sequence of edges as input at one go. For each , let denote the status of after the first edges have been inserted into . We have to determine, for each , if the graph contains a cycle. This offline version can easily be solved in time using binary search. In contrast, we are still far away from designing an algorithm for the actual, incremental version of the problem that has total update time.111Throughout this paper, we use the notation to hide polylog factors. This is especially relevant, because at present we do not know of any technique in the conditional lower bounds literature [AbboudW14, HenzingerKNS15, KopelowitzPP16] that can prove a separation between the best possible total update time for an incremental problem and the best possible running time for the corresponding offline version. Thus, although it might be the case that there is no incremental algorithm for cycle detection and topological ordering with near-linear total update time, proving such a statement is beyond the scope of current techniques. With this observation in mind, we now review the current state of the art on the algorithmic front. We mention three results that are particularly relevant to this paper.

Result (1): There is an incremental algorithm with total update time of . This follows from the work of [BenderFGT16, BenderFG09, CohenFKR13]. So the problem is well understood for dense graphs where .

Result (2): There is an incremental algorithm with total update time of . This follows from the work of [BenderFGT16, HaeuplerKMST08, HaeuplerKMST12, CohenFKR13].

Result (3): There is a randomized incremental algorithm with total expected update time of . This follows from the very recent work of [BernsteinC18].

Significance of Theorem 1.1. We obtain a randomized incremental algorithm for cycle detection and topological ordering that has an expected total update time of . Prior to this, all incremental algorithms for this problem had a total update time of for sparse graphs with . Our algorithm breaks this barrier by achieving a bound of on sparse graphs. More generally, our total update time bound of outperforms the bound from result (3) as long as . Note that if then result (3) gets superseded by result (1). On the other hand, result (3) is no worse than result (2) for all values of .222Throughout this paper we assume that . This is because if then many nodes remain isolated (with zero degree) in the final graph, and we can ignore those isolated nodes while analyzing the total update time of the concerned algorithm.333It is easy to combine two incremental algorithms and get the “best of both worlds”. For example, suppose that we want to combine results (1) and (3) to get a total update time of , without knowing the value of in advance. Then we can initially start with the algorithm from result (3) and then switch to the algorithm from result (1) when becomes . Thus, prior to our work result (3) gave the best known total update time when , whereas result (1) gave the best known total update time when . We now improve upon the bound from result (3) in this “interesting” range of sparsity where .

We are also able to break, for the first time in the literature, a barrier on the total update time of a certain type of algorithms that was identified by Haeupler et al. [HaeuplerKMST12]. Specifically, they defined an algorithm to be local iff it satisfies the following property. Suppose that currently the graph is acyclic, and the algorithm maintains a topological ordering on the node-set such that for every edge . In other words, every edge is a forward edge under . At this point, a directed edge gets inserted into the graph . Then the algorithm updates the topological ordering after this edge insertion only if . Furthermore, if , then the algorithm changes the positions of only those nodes in this topological ordering that lie in the affected region, meaning that a node changes its position only if just before the insertion of the edge. Haeupler et al. [HaeuplerKMST12] showed that any local algorithm for incremental cycle detection and topological ordering must necessarily have a total update time of . Interestingly, although the algorithms that lead to results (1) and (3) are not local, prior to our work no algorithm (local or not) was known in the literature that beats this lower bound for any nontrivial value of . In sharp contrast, our algorithm (which is not local) has a total update time of , and this beats the lower bound of Haeupler et al. [HaeuplerKMST12] when .

Our Technique. We obtain our result by combining the framework of Bernstein and Chechik [BernsteinC18] with the balanced search procedure of Haeupler et al. [HaeuplerKMST12]. We first present a high level overview of the algorithm in [BernsteinC18]. Say that a node is an ancestor (resp. descendant) of another node iff there is a directed path from to (resp. from to ) in the current graph . The algorithm in [BernsteinC18] is parameterized by an integer whose value will be fixed later on. Initially, each node

is sampled with probability

. Bernstein and Chechik [BernsteinC18] maintain a partition of the node-set into subsets , where a node belongs to a subset iff it has exactly ancestors and descendants among the sampled nodes. A total order is defined on the subsets , where iff either or . Next, it is shown that this partition and the total order satisfies two important properties. (1) If contains a cycle, then w.h.p. all the nodes in that cycle belong to the same subset in the partition. (2) As long as remains acyclic, every edge is either an internal edge or a forward edge w.r.t. the total order ; this means that the subset containing is either the same as or appears before the subset containing . Intuitively, these two properties allow us to decompose the problem into smaller parts. All we need to do now is (a) maintain the subgraphs induced by the subsets , and (b) maintain a topological ordering within each subgraph . Task (a) is implemented by using an incremental algorithm for single-source reachability and a data structure for maintaining an ordered list [DietzS87].

For task (b), consider the scenario where an edge gets inserted and both and belong to the same subgraph . Suppose that appears after in the current topological ordering in . We now have to check if the insertion of the edge creates a cycle, or, equivalently, if there already exists a directed path from to . In [BernsteinC18] this task is performed by doing a forward search from . Intuitively, this means exploring the nodes that are reachable from and appear before in the current topological ordering. If we encounter the node during this forward search, then we have found the desired path from to , and we can report that the insertion of the edge indeed creates a cycle. The time taken to implement this forward search is determined by the number of nodes that are explored during this search. Bernstein and Chechik [BernsteinC18] now introduce a crucial notion of -related pairs of nodes (see Section 2.1 for details), and show that for every node explored during the forward search we get a newly created -related pair . Next, they prove an upper bound of on the total number of such pairs that can appear throughout the duration of the algorithm. This implies that the total number of nodes explored during forward search is also at most , and this in turn help us fix the value of (to balance the time taken for task (a)) and bound the total update time.

We now explain our main idea. Inspired by the balanced search technique from [HaeuplerKMST12], we modify the subroutine for implementing task (b) as follows. We simultaneously perform a forward search from and a backward search from . The forward search proceeds as in [BernsteinC18]. The backward search, on the other hand, explores the nodes such that is reachable from and appears before in the current topological ordering. We alternate between a forward search step and a backward search step, so that at any point in time the number of nodes respectively explored by these two searches are equal to one other. If these two searches meet at some node , then we have found a path from to (the path goes via ), and we accordingly declare that the insertion of the edge creates a cycle. The time taken to implement task (b) is again determined by the number of nodes explored during the forward search, since this is the same as the number of nodes explored during the backward search. Now comes the following crucial observation. For every node explored during the forward search and every node explored during the backward search after the insertion of an edge , we get a newly created -related pair . Thus, if nodes are explored by each of these searches, then we get newly created -related pairs; although we still explore only nodes overall. In contrast, the algorithm in [BernsteinC18] creates only many new -related pairs whenever it explores nodes. This quadratic improvement in the creation of new -related pairs leads to a much stronger bound on the total number of nodes explored by our algorithm, because as in [BernsteinC18] we still can have at most many newly created -related pairs during the entire course of the algorithm. This improved bound on the number of explored nodes leads to an improved bound of on the total update time.

2 Our Algorithm: Proof of Theorem 1.1

This section is organized as follows. In Section 2.1 we recall some useful concepts from [BernsteinC18]. In Section 2.2 we present our incremental algorithm, and in Section 2.3 we analyze its total update time. The full version of the algorithm (containing the proofs missing from the main body) appears in Appendix A.

2.1 Preliminaries

Throughout the paper, we assume that the maximum degree of a node in is at most times the average degree. It was observed in [BernsteinC18] that this assumption is without any loss of generality.

Assumption 2.1.

[BernsteinC18] Every node in has an out-degree of and an in-degree of .

We say that a node is an ancestor of another node iff there is a directed path from to in . We let denote the set of all ancestors of . Similarly, we say that is a descendant of iff there is a directed path from to in . We let denote the set of all descendants of . A node is both an ancestor and a descendant of itself, that is, we have . We also fix an integral parameter whose exact value will be determined later on. Note that if there is a path from a node to another node in , then and . Such a pair of nodes is said to be -related iff the number of nodes in each of the sets and does not exceed .

Definition 2.2.

[BernsteinC18]

We say that an ordered pair of nodes

is -related in the graph iff there is a path from to in , and and . We emphasize that for the ordered pair to be -related, it is not necessary that there be an edge .

If two nodes are part of a cycle, then clearly and , and both the ordered pairs and are -related. In other words, if an ordered pair is not -related, then there is no cycle containing both and . Intuitively, therefore, the notion of -relatedness serves as a relaxation of the notion of two nodes being part of a cycle. Next, note that the graph keeps changing as more and more edges are inserted into it. So it might be the case that an ordered pair of nodes is not -related in at some point in time, but is -related in at some other point in time. The following definition and the subsequent theorem becomes relevant in light of this observation.

Definition 2.3.

[BernsteinC18] We say that an ordered pair of nodes is sometime -related in the graph iff it is -related at some point in time during the entire sequence of edge insertions in .

Theorem 2.4.

[BernsteinC18] The number of sometime -related pairs of nodes in is at most .

Following [BernsteinC18], we maintain a partition of the node-set into subsets and the subgraphs induced by these subsets of nodes. We sample each node independently with probability . Let denote the set of these sampled nodes. The outcome of this random sampling gives rise to a partition of the node-set into many subsets , where . This is formally defined as follows. For every node , let and respectively denote the set of ancestors and descendants of that have been sampled. Each subset is indexed by an ordered pair where and . A node belongs to a subset iff and . In words, the index of the subset specifies the number of sampled ancestors and sampled descendants each node is allowed to have. It is easy to check that the subsets form a valid partition the node-set . Let denote the set of edges in whose both endpoints lie in , and let denote the subgraph of induced by the subset of nodes . We also define a total order on the subsets , where we have iff either or . We slightly abuse the notation by letting denote the unique subset which contains the node . Consider any edge . If the two endpoints of the edge belong to two different subsets in the partition , i.e., if , then we refer to the edge as a cross edge. Otherwise, if , then the edge is an internal edge.

Lemma 2.5.

[BernsteinC18] Consider the partition of the node-set into subsets , and the subgraphs induced by these subsets of nodes. They satisfy the following three properties.

  • If there is a cycle in , then every edge of that cycle is an internal edge.

  • For every cross edge , we have .

  • Consider any two nodes for some . If there is a path from to in the subgraph , then with high probability the ordered pair is -related in .

The first property states that the graph contains a cycle iff some subgraph contains a cycle. Hence, in order to detect a cycle in it suffices to only consider the edges that belong to the induced subgraphs . The second property, on the the other hand, implies that if the graph is acyclic, then it admits a topological ordering that is consistent with the total order , meaning that for all with . Finally, the last property states that whenever a subgraph contains a path from a node to some other node , with high probability the ordered pair is -related in the input graph .

2.2 The algorithm

Since edges never get deleted from the graph , our algorithm does not have to do anything once it detects a cycle (for the graph will continue to have a cycle after every edge-insertion in the future). Accordingly, we assume that the graph

has remained acyclic throughout the sequence of edge insertions till the present moment, and our goal is to check if the next edge-insertion creates a cycle in

. Our algorithm maintains a topological ordering of the node-set in the graph that is consistent with the total order on the subsets of nodes , as defined in Section 2.2. Specifically, we maintain a priority for every node , and for every two nodes with we ensure that . As long as remains acyclic, the existence of such a topological ordering is guaranteed by Lemma 2.5.

Data Structures. We maintain the partition of the node-set and the subgraphs induced by the subsets in this partition. We use an ordered list data structure [DietzS87] on the node-set to implicitly maintain the priorities associated with the topological ordering . This data structure supports each of the following operations in time.

  • INSERT-BEFORE(): This inserts the node just before the node in the topological ordering.

  • INSERT-AFTER(): This inserts the node just after the node in the topological ordering.

  • DELETE(): This deletes the node from the existing topological ordering.

  • COMPARE(): If , then this returns YES, otherwise this returns NO.

The implementation of our algorithm requires the creation of two dummy nodes and in every subset . We ensure that for all . In words, the dummy node (resp. ) comes first (resp. last) in the topological order among all the nodes in . Further, for all nodes with we have , and for all nodes with we have .

Handling the insertion of an edge in . By induction hypothesis, suppose that the graph currently does not contain any cycle and we are maintaining the topological ordering in . At this point, an edge gets inserted into . Our task now is to first figure out if the insertion of this edge creates a cycle, and if not, then to update the topological ordering . We perform this task in four phases, as described below.

  1. In phase I, we update the subgraphs .

  2. In phase II, we update the total order to make it consistent with the total order .

  3. In phase III, we check if the edge-insertion creates a cycle in . See Section 2.2.1 for details.

  4. If phase III fails to detect a cycle, then in phase IV we further update (if necessary) the total order so as to ensure that it is a topological order in the current graph . See Section 2.2.2 for details.

Remark. We follow the framework developed in [BernsteinC18] while implementing Phase I and Phase II. We differ from [BernsteinC18] in Phase III and Phase IV, where we use the balanced search approach from [HaeuplerKMST12].

Implementing Phase I. In the first phase, we update the subgraphs such that they satisfy the properties mentioned in Lemma 2.5. The next lemma follows from [BernsteinC18]. The key idea is to maintain incremental single-source reachability data structures from each of the sampled nodes. Since at most many nodes are sampled in expectation, and since each incremental single-source reachability data structure requires total update time to handle edge insertions, we get the desired bound of .

Lemma 2.6.

[BernsteinC18] In phase I, the algorithm spends total update time in expectation.

Implementing Phase II. In this phase we update the total order on the node-set in a certain manner. Let and respectively denote the graph just before and just after the insertion of the edge . Similarly, for every node , let and respectively denote the subset just before and just after the insertion of the edge . At the end of this phase, the following properties are satisfied.

Property 2.7.

[BernsteinC18] At the end of phase II the total order on is consistent with the total order on . Specifically, for any two nodes and , if , then we also have .

Property 2.8.

[BernsteinC18] At the end of phase II the total order on remains a valid topological ordering of , where denotes the graph just before the insertion of the edge .

The next lemma bounds the total time spent by the algorithm in phase II.

Lemma 2.9.

[BernsteinC18] The total time spent in phase II across all edge-insertions is at most .

Proof.

(Sketch) Let be a counter that keeps track of the number of times some node moves from one subset in the partition to another. Recall that a node belongs to a subset iff and . As more and more edges keep getting inserted in , the node can never lose a sampled node in as its ancestor or descendent. Instead, both the sets and can only grow with the passage of time. Since , each node can move from one subset in the partition to another at most times. Thus, we have . Since , we conclude that . Now, phase II can be implemented in such a way that a call is made to the ordered list data structure [DietzS87] only when some node moves from one subset of the partition to another. So the total time spent in phase II is at most , which happens to be in expectation. ∎

2.2.1 Phase III: Checking if the insertion of the edge creates a cycle.

Let and respectively denote the graph before and after the insertion of the edge . Consider the total order on the set of nodes in the beginning of phase III (or, equivalently, at the end of phase II). Property 2.7 guarantees that is consistent with the total order on , and Property 2.8 guarantees that is a valid topological ordering in . We will use these two properties throughout the current phase. The pseudocodes of all the subroutines used in this phase appear in Section 2.4.

In phase III, our goal is to determine if the insertion of the edge creates a cycle in . Note that if , then is also a valid topological ordering in as per Property 2.8, and clearly the insertion of the edge does not create a cycle. The difficult case occurs when . In this case, we first infer that , meaning that both and belong to the same subset in the partition at the end of phase II. This is because of the following reason. The total order is consistent with the total order as per Property 2.7. Accordingly, since , we conclude that if then . But this would contradict Lemma 2.5 as there is a cross edge from to .

To summarize, for the rest of this section we assume that and for some . We have to check if there is a path from to in . Along with the edge , such a path will define a cycle in . Hence, by Lemma 2.5, every edge in such a path will belong to the subgraph . Thus, from now on our task is to determine if there is a path from to in . We perform this task by calling the subroutine SEARCH() described below.

SEARCH(). We conduct two searches in order to find the path : A forward search from , and a backward search from . Specifically, let and respectively denote the set of nodes visited by the forward search and the backward search till now. We always ensure that . A node in (resp. ) is referred to as a forward (resp. backward) node. Every forward node is reachable from the node in , whereas the node is reachable from every backward node in

. We further classify each of the sets

and into two subsets: , and , . The nodes in and are called alive, whereas the nodes in and are called dead. Intuitively, the dead nodes have already been explored by the search, whereas the alive nodes have not yet been explored. When the subroutine begins execution, we have and . The following property is always satisfied.

Property 2.10.

Every node is reachable from the node in , and the node is reachable from every node in . The sets and are pairwise mutually exclusive.

A simple strategy for exploring a forward and alive node is as follows. For each of its outgoing edges , we check if . If yes, then we have detected a path from to : This path goes from to (this is possible since is a forward node), follows the edge , and then from it goes to (this is possible since is a backward node). Accordingly, we stop and report that the graph contains a cycle. In contrast, if and , then we insert into the set (and ), so that becomes a forward and alive node which will be explored in future. In the end, we move the node from the set to the set . We refer to the subroutine that explores a node as EXPLORE-FORWARD().

Analogously, we explore a backward and alive node is as follows. For each of its incoming edges , we check if . If yes, then there is a path from to : This path goes from to (this is possible since is a forward node), follows the edge , and then from it goes to (this is possible since is a backward node). Accordingly, we stop and report that the graph contains a cycle. In contrast, if and , then we insert into the set (and ), so that becomes a backward and alive node which will be explored in future. In the end, we move the node from the set to the set . We refer to the subroutine that explores a node as EXPLORE-BACKWARD().

Property 2.11.

Once a node (resp. ) has been explored, we delete it from the set (resp. ) and insert it into the set (resp. ).

While exploring a node (resp. ), we ensure that all its outgoing (resp. incoming) neighbors are included in (resp. ). This leads to the following important corollary.

Corollary 2.12.

Consider any edge . At any point in time, if , then at that time we also have . Similarly, at any point in time, if , then at that time we also have .

Two natural questions arise at this point. First, how frequently do we explore forward nodes compared to exploring backward nodes? Second, suppose that we are going to explore a forward (resp. backward) node at the present moment. Then how do we select the node from the set (resp. ) that has to be explored? Below, we state two crucial properties of our algorithm that address these two questions.

Property 2.13.

(Balanced Search) We alternate between calls to EXPLORE-FORWARD(.) and EXPLORE-BACKWARD(.). This ensures that at every point in time. In other words, every forward-exploration step is followed by a backward-exploration step and vice versa.

Property 2.14.

(Ordered Search) While deciding which node in to explore next, we always pick the node that has minimum priority . Thus, we ensure that the subroutine EXPLORE-FORWARD() is only called on the node that appears before every other node in in the total ordering . In contrast, while deciding which node in to explore next, we always pick the node that has maximum priority . Thus, we ensure that the subroutine EXPLORE-BACKWARD() is only called on the node that appears after every other node in in the total ordering .

An immediate consequence of Property 2.14 is that there is no gap in the set as far as reachability from the node is concerned. To be more specific, consider the sequence of nodes in that are reachable from in increasing order of their positions in the total order . This sequence starts with . The set of nodes belonging to always form a prefix of this sequence. This observation is formally stated below.

Corollary 2.15.

Consider any two nodes such that and there is a path in from to each of these two nodes. At any point in time, if , then we must also have .

Corollary 2.16 is a mirror image of Corollary 2.15, albeit from the perspective of the node .

Corollary 2.16.

Consider any two nodes such that and there is a path in from each of these two nodes to . At any point in time, if , then we must also have .

To complete the description of the subroutine SEARCH(), we now specify six terminating conditions. Whenever one of these conditions is satisfied, the subroutine does not need to run any further because it already knows whether or not the insertion of the edge creates a cycle in the graph .

(C1) .

In this case, we conclude that the graph remains acyclic even after the insertion of the edge . We now justify this conclusion. Recall that if the insertion of the edge creates a cycle, then that cycle must contain a path from to in . When the subroutine SEARCH() begins execution, we have and . Hence, Property 2.11 implies that at the present moment and . Since the sets are pairwise mutually exclusive (see Property 2.10) and , we currently have and . Armed with this observation, we consider the path from to , and let be the first node in this path that does not belong to . Let denote the node that appears just before in this path. Then by definition, we have and . Now, applying Corollary 2.12, we get , which leads to a contradiction.

(C2) .

This is analogous to the condition (C1) above, and we conclude that remains acyclic in this case.

(C3) While exploring a node , we discover that has an outgoing edge to a node .

Here, we conclude that the insertion of the edge creates a cycle. We now justify this conclusion. Since , Property 2.10 implies that there is a path from to . Since , Property 2.10 also implies that there is a path from to . We get a cycle by combining the path , the edge , the path and the edge .

(C4) While exploring a node , we discover that has an incoming edge from a node .

Similar to condition (C3), in this case we conclude that the insertion of the edge creates a cycle.

(C5) .

If this happens, then we conclude that the graph remains acyclic even after the insertion of the edge . We now justify this conclusion. Suppose that the insertion of the edge creates a cycle. Such a cycle defines a path from to . Below, we make a claim that will be proved later on.

Claim 2.1.

The path contains at least one node from the set .

Armed with Claim 2.1, we consider any node in the path that belongs to the set . Let . Note that . In particular, we infer that . As , the node is reachable from (see Property 2.10). Similarly, as the node lies on the path , the node is also reachable from . Since the node is reachable from both the nodes and , and since , Corollary 2.16 implies that . This leads to a contradiction, for and (see Property 2.10). Hence, our initial assumption was wrong, and the insertion of the edge does not create a cycle in . It now remains to prove Claim 2.1.

Proof of Claim 2.1. Applying the same argument used to justify condition (C1), we first observation that and . As the subsets and are pairwise mutually exclusive (see Property 2.10), we have . Note that if , then there is nothing further to prove. Accordingly, for the rest of the proof we consider the scenario where . Since and , there has to be at least one node in the path that does not belong to the set . Let be the first such node, and let be the node that appears just before in the path . Thus, we have , and . Hence, Corollary 2.12 implies that . So the path contains some node from the set . ∎

(C6) .

Similar to condition (C5), here we conclude that the graph remains acyclic.

We now state an important corollary that follows from our stopping conditions (C5) and (C6). It states that every node appears before every node in the total order in phase III.

Corollary 2.17.

We always have .

Proof.

Suppose that the corollary is false. Note that initially when the subroutine SEARCH() begins execution, we have and hence the corollary is vacuously true at that time. Consider the first time-instant (say) when the corollary becomes false. Accordingly, we have:

(2.1)

One the following two events must have occurred at time for the corollary to get violated.

(1) A node was explored during a call to the subroutine EXPLORE-FORWARD(). The subroutine EXPLORE-FORWARD() then moved the node from the set to the set , which violated the corollary. Note that a call to EXPLORE-FORWARD(.) can only be made if just before time (see stopping condition (C5) and Property 2.14). Thus, from (2.1) we conclude that the corollary remains satisfied even after adding the node to the set . This leads to a contradiction.

(2) A node was explored during a call to EXPLORE-FORWARD(). The subroutine EXPLORE-FORWARD() then moved the node from the set to the set , which violated the corollary. Applying an argument analogous to the one applied in case (1), we again reach a contradiction. ∎

The proof of Lemma 2.18 follows immediately from the preceding discussion. Next, Lemma 2.19 bounds the time spent in any single call to the subroutine SEARCH().

Lemma 2.18.

The subroutine SEARCH() in Figure 2 returns YES if the insertion of the edge creates a cycle in the graph , and NO otherwise.

Lemma 2.19.

Consider any call to the subroutine SEARCH(). The time spent on this call is at most times the size of the set at the end of the call.

Proof.

(Sketch) Each call to EXPLORE-FORWARD() or EXPLORE-BACKWARD() takes time proportional to the out-degree (resp. in-degree) of in the subgraph . Under Assumption 2.1, the maximum in-degree and maximum out-degree of a node in are both at most . Thus, a single call to EXPLORE-FORWARD() or EXPLORE-BACKWARD() takes time.

According to Property 2.14, whenever we want to explore a node during forward-search (resp. backward-search), we select a forward-alive (resp. backward-alive) node with minimum (resp. maximum) priority. This step can be implemented using a priority queue data structure in time.

So the time spent by procedure SEARCH() is at most times the number of calls to the subroutines EXPLORE-FORWAD(.) or EXPLORE-BACKWARD(.). Furthermore, after each call to the subroutine EXPLORE-FORWAD(.) or EXPLORE-BACKWARD(.), the size of the set or respectively increases by one. Accordingly, the time spent on one call to SEARCH() is at most times the size of the set at the end of the call. The lemma now follows from Property 2.13. ∎

Total time spent in phase III. We now analyze the total time spent in phase III, over the entire sequence of edge insertions in . For , consider the edge-insertion in the graph , and let denote the size of the set at the end of phase III while handling this -edge insertion. Lemma 2.19 implies that the total time spent in phase III is at most . We now focus on upper bounding the sum .

Lemma 2.20.

We have .

Proof.

For any , let and respectively denote the sets and at the end of phase III while handling the edge-insertion in . Furthermore, let and respectively denote the input graph and the subgraph after the edge-insertion in .

Suppose that the edge is the edge to be inserted into . We focus on the procedure for handling this edge insertion. During this procedure, if we find in the beginning of phase III, then our algorithm immediately declares that the insertion of the edge does not create a cycle and moves on to phase IV. In such a scenario, we clearly have and hence . Accordingly, from now on we assume that in the beginning of phase III. Consider any two nodes and . The nodes and belong to the same subgraph . Property 2.10 guarantees that there is a path from to in – we can go from to , take the edge and then go from to . Hence, by Lemma 2.5, the ordered pair is -related in with high probability. We condition on this event for the rest of the proof. We now claim that there was no path from to in : this is the graph just before the edge-insertion, or equivalently, just after the edge-insertion. To see why this claim is true, we recall Property 2.8. This property states that in the beginning of phase III (after the edge-insertion) the total order on the node-set is a topological order in the graph . Since and , Corollary 2.17 implies that appears before in the total order in phase III (after the edge-insertion). From these last two observations, we conclude that there is no path from to in . As edges only get inserted into with the passage of time, this also implies that there is no path from to in the graph , for all . Accordingly, the ordered pair is not -related in the graph for any .

To summarize, for every node and every node the following conditions hold. (1) The ordered pair is -related in the graph . (2) For all , the ordered pair is not -related in the graph . Let denote a counter which keeps track of the number of sometime -related pairs of nodes (see Definition 2.3). Conditions (1) and (2) imply that every ordered pair of nodes , where and , contributes one towards the counter . A simple counting argument gives us:

(2.2)

In the above derivation, the last equality follows from Theorem 2.4. We now recall Property 2.13, which says that our algorithm in phase III explores (almost) the same number of forward and backward nodes. In particular, we have for all . This observation, along with (2.2), implies that . This concludes the proof of the lemma. ∎

Corollary 2.21.

We have .

Proof.

We partition the set of indices into two subsets:

It is easy to check that . Accordingly, for the rest of the proof we focus on bounding the sum . Towards this end, for each , we first express the quantity as , where . Now, Lemma 2.20 implies that:

(2.3)

We also note that:

(2.4)

From (2.3) and (2.4), we get , which in turn gives us: . This leads to the following upper bound on the sum .

This concludes the proof of the corollary. ∎

We are now ready to upper bound the total time spent by our algorithm in phase III.

Lemma 2.22.

We spend total time in phase III, over the entire sequence of edge-insertions.

Proof.

Lemma 2.19 implies that the total time spent in phase III is . The lemma now follows from Corollary 2.21. ∎

2.2.2 Phase IV: Ensuring that is a topological ordering for (only when is acyclic)

As in Section 2.2.1, we let and respectively denote the graph just before and after the insertion of the edge . If in phase III we detect a cycle, then we do not need to perform any nontrivial computation from this point onward, for the graph will contain a cycle after every future edge-insertion. Hence, throughout this section we assume that no cycle was detected in phase III, and as per Lemma 2.18 the graph is acyclic. Our goal in phase IV is to update the total order so that it becomes a topological ordering in . Towards this end, note that does not change during phase III. Furthermore, if in phase III, then the first three paragraphs of Section 2.2.1 imply that is already a topological ordering of , and nothing further needs to be done. Thus, from now on we assume that and for some in phase III.

Recall the six terminating conditions for the subroutine SEARCH() used in phase III (see the discussion after Corollary 2.16). We have already assumed that we do not detect any cycle in phase III. Hence, the subroutine SEARCH() terminates under one of the following four conditions: (C1), (C2), (C5) and (C6). How we update the total order in phase IV depends on the terminating condition under which the subroutine SEARCH() returned in phase III. In particular, there are two cases to consider.

Case 1. The subroutine SEARCH() returned under condition (C2) or (C6) in phase III.

In this scenario, we update the total order by calling the subroutine described in Figure 1 (see Section 2.4). In this subroutine, the symbols and respectively denotes the set of forward-dead and backward-dead nodes at the end of phase III. Similarly, we will use the symbols and respectively to denote the set of forward-alive and backward-alive nodes at the end of phase III. The subroutine works as follows.

When the subroutine SEARCH() begins execution in phase III, we had and . Since SEARCH() returned under conditions (C2) or (C6), Property 2.11 implies that and at the end of phase III. Thus, when phase IV begins, let be the nodes in in increasing order of priorities, so that