For a directed graph , with , the Strongly-Connected Components (SCCs) of are the sets of the unique partition of the vertex set into sets such that for any two vertices , there exists a directed cycle in containing and if and only if . In the Single-Source Reachability (SSR) problem, we are given a distinguished source and are asked to find all vertices in that can be reached from . The SSR problem can be reduced to finding the SCCs by inserting edges from each vertex in to the distinguished source .
Finding SCCs in static graphs in time is well-known since 1972[tarjan1972depth] and is commonly taught in undergraduate courses, also appearing in CLRS[cormen2009introduction].
In this paper we focus on maintaining SCCs in a dynamic graph. The most general setting is the fully dynamic one, where edges are being inserted and deleted into the graph. While many connectivity problems for undirected graphs have been solved quite efficiently [holm2001poly, wulff2013faster, thorup2000near, thorup2007fully, nanongkai2017dynamic], in fully-dynamic graphs, the directed versions of these problems have proven to be much harder to approach.
In fact, Abboud and Vassilevska[abboud2014popular] showed that any algorithm that can maintain whether there are more than 2 SCCs in a fully-dynamic graph with update time and query time , for any constant , would imply a major breakthrough on SETH. The same paper also suggests that update time and query time for maintaining the number of reachable vertices from a fixed source would imply a breakthrough for combinatorial Matrix Multiplication.
For this reason, research on dynamic SCC and dynamic single-source reachability has focused on the partially dynamic setting (decremental or incremental). In this paper we study the decremental setting, where the original graph only undergoes edge deletions (no insertions). We note that both lower bounds above extend to decremental algorithms with worst-case update time , so all existing results focus on the amortized update time.
The first algorithm to maintain SSR faster than recomputation from scratch achieved total update time [shiloach1981line]. The same update time for maintaining SCCs was achieved by a randomized algorithm by Roddity and Zwick[roditty2008improved]. Their algorithm also establishes that any algorithm for maintaining SSR can be turned into a randomized algorithm to maintain the SCCs incurring only an additional constant multiplicative factor in the running time. Later, Łącki[lkacki2013improved] presented a simple deterministic algorithm that matches total update time and that also maintains the transitive closure. For several decades, it was not known how to get beyond total update time , until a recent breakthrough by Henzinger, Krinninger and Nanongkai[henzinger2014sublinear, henzinger2015improved] reduced the total update time to expected time . Even more recently, Chechik et. al.[chechik2016decremental] showed that a clever combination of the algorithms of Roditty and Zwick, and Łącki can be used to improve the expected total update time to . We point out that all of these recent results rely on randomization and in fact no deterministic algorithm for maintaining SCCs or SSR beyond the bound is known for general graphs. For planar graphs, Italiano et. al.[italiano2017decremental] presented a deterministic algorithm with total update time .
Finally, in this paper, we present the first algorithm for general graphs to maintain SCCs in expected total update time with constant query time, thus presenting the first near-optimal algorithm for the problem. We summarize our result in the following theorem.
Given a graph with edges and vertices, we can maintain a data structure that supports the operations:
: Deletes the edge from the graph ,
: Returns whether and are in the same SCC in ,
in total expected update time and with worst-case constant query time. The same time bounds apply to answer for a fixed source vertex queries on whether a vertex can be reached from . The bound holds against an oblivious adaptive adversary.
Our algorithm makes the standard assumption of an oblivious adversary which does not have access to the coin flips made by the algorithm. But our algorithm does NOT require the assumption of a non-adaptive adversary, which is ignorant of answers to queries as well: the reason is simply that SCC and SSR information is unique, so the answers to queries do not reveal any information about the algorithm. One key exception is that for SSR, if the algorithm is expected to return a witness path, then it does require the assumption of a non-adaptive adversary. A standard reduction described in appendix A also implies a simple algorithm for maintaining reachability from some set to in a fully-dynamic graph with vertex set that is a data structure that answers queries for any on whether can reach . The amortized expected update time is and query time for every . We allow vertex updates, i.e. insertions or deletions of vertices with incident edges, which are more general than edge updates. This generalizes a well-known trade-off result for All-Pairs Reachability[roditty2016fully, lkacki2013improved] with amortized update time and query time for every .
Finally, we point out that maintaining SCCs and SSR is related to the (more difficult) shortest-path problems. In fact, the algorithms [henzinger2014sublinear, henzinger2015improved, shiloach1981line] can also maintain shortest-paths in decremental directed graphs. For undirected graphs, the decremental Single-Source Shortest-Path problem was recently solved to near-optimality[henzinger2014decremental], and deterministic algorithms[bernstein2016deterministic, bernstein2017deterministic] have been developed that go beyond the barrier. We hope that our result inspires new algorithms to tackle the directed versions of these problems.
In this paper, we let a graph refer to a directed multi-graph where we allow multiple edges between two endpoints and self-loops but say that a cycle contains at least two distinct vertices. We refer to the vertex set of by and the edge set of by . We denote the input graph by , let and and define and . If the context is clear, we simply write in calculations instead of to avoid cluttering. A subgraph of is a graph with and . Observe that this deviates from the standard definition of subgraphs since we require the vertex set of a subgraph to be identical to the graphs vertex set. We write as a shorthand for and as a shorthand for . For any , we define to be the set , i.e. the set of all edges in that leave a vertex in ; we analogously define and . If the context is clear, we drop the superscript and simply write .
For any graph , and any two vertices , we denote by the distance from to in . We also define the notion of -distances for any where for any pair of vertices , the -distance denotes the minimum number of vertices in encountered on any path from to . Alternatively, the -distance corresponds to where is a graph with edges of weight and edges of weight . It therefore follows that for any , .
We define the diameter of a graph by and the -diameter by . Therefore, . For convenience, we often omit the subscript on relations if the context is clear and write
We denote that a vertex reaches in by , and if and , we simply write and say and are strongly-connected. We also use and without the subscript if the underlying graph is clear from the context. We say that is strongly-connected if for any , . We call the maximal subgraphs of that are strongly-connected, the strongly-connected components (SCCs). We denote by the condensation of , that is the graph where all vertices in the same SCC in are contracted. To distinguish we normally refer to the vertices in as nodes. Each node in corresponds to a vertex set in . The node set of a condensation forms a partition of . For convenience we define the function for a family of sets with . This is useful when discussing condensations. Observe further that can be a multi-graph and might also contain self-loops. If we have an edge set with all endpoints in , we let be the multi-graph obtained by mapping the endpoints of each vertex in to their corresponding SCC node in and adding the resulting edges to .
Finally, for two partitions and of a set , we say that partition is a melding for a partition if for every set , there exists a set with . We also observe that melding is transitive, thus if is a melding for and a melding for then is a melding for .
We now introduce the graph hierarchy maintained by our algorithm, followed by a high-level overview of our algorithm.
High-level overview of the hierarchy.
Our hierarchy has levels to and we associate with each level a subset of the edges . The sets form a partition of ; we define the edges that go into each later in the overview. We define a graph hierarchy such that each graph is defined as
That is, each is the condensation of a subgraph of with some additional edges. As mentioned in the preliminary section, we refer to the elements of the set as nodes to distinguish them from vertices in . We use capital letters to denote nodes and small letters to denote vertices. We let denote the node in with . Observe that each node corresponds to a subset of vertices in and that for any , can in fact be seen as a partition of . For , the set is a partition of singletons, i.e. , and for each .
Let us offer some intuition for the hierarchy. The graph contains all the vertices of , and all the edges of . By definition of Condensation, the nodes of precisely correspond to the SCCs of . also includes the edges (though some of them are contracted into self-loops in ), as well as the additional edges in . These additional edges might lead to having larger SCCs than those of ; each SCC in then corresponds to a node in . More generally, the nodes of are the SCCs of .
As we move up the hierarchy, we add more and more edges to the graph, so the SCCs get larger and larger. Thus, each set is a melding for any for ; that is for each node there exists a set such that . We sometimes say we meld nodes to if and . Additionally, we observe that for any SCC in , we meld the nodes in SCC Y to a node in , and consists exactly of the vertices contained in the nodes of . More formally, .
Our algorithm will maintain SCCs for every graph . We observe that because the sets form a partition of , the top graph contains all the edges of , and that SCCs in are thus the same as SCCs in . We can therefore answer queries on whether two vertices are in the same SCC by checking if is equal to .
To maintain the SCCs in each graph , our algorithm employs a bottom-up approach. At level we want to maintain SCCs in the graph with all the edges in , but instead of doing so from scratch, we use the SCCs maintained at level as a starting point. The SCCs in are precisely the SCCs in the graph with edge set ; so to maintain the SCCs at level , we only need to consider how the sliver of edges in cause the SCCs in to be melded into larger SCCs (which then become the nodes of ).
If the adversary deletes an edge in , all the graphs and below remain unchanged, as do the nodes of . But the deletion might split apart an SCC in , which will in turn cause a node of to split into multiple nodes. This split might then cause an SCC of to split, which will further propagate up the hierarchy.
In addition to edge deletions caused by the adversary, our algorithm will sometimes move edges from to . Because the algorithm only moves edges up the hierarchy, each graph is only losing edges, so the update sequence remains decremental from the perspective of each . We now give an overview of how our algorithm maintains the hierarchy efficiently.
A fundamental data structure that our algorithm employs is the ES-tree[shiloach1981line, henzinger1995fully] that for a directed unweighted graph undergoing edge deletions, and a distinguished source maintains the distance for each . In fact, the ES-tree maintains a shortest-path tree rooted at . We refer to this tree subsequently as ES out-tree. We call the ES in-tree rooted at the shortest-path tree maintained by running the ES-tree data structure on the graph with reversed edge set, i.e. the edge set where each edge appears in the form . We can maintain each in-tree and out-tree decrementally to depth in time ; that is we can maintain the distances and exactly until one of the distances or exceeds .
Maintaining SCCs with ES-trees.
Consider again graph and let be some SCC in that we want to maintain. Let some node be chosen to be the center node of the SCC (In the case of , the node is just a single-vertex set ). We then maintain an ES in-tree and an ES out-tree from that spans the nodes in in the induced graph . We must maintain the trees up to distance , so the total update time is .
Now, consider an edge deletion to such that the ES in-tree or ES out-tree at is no longer a spanning tree. Then, we detected that the SCC has to be split into at least two SCCs that are node-disjoint with . Then in each new SCC we choose a new center and initialize a new ES in-tree and ES out-tree.
Exploiting small diameter.
The above scheme clearly is quite efficient if is very small. Our goal is therefore to choose the edge set in such a way that contains only SCCs of small diameter. We therefore turn to some insights from [chechik2016decremental] and extract information from the ES in-tree and out-tree to maintain small diameter. Their scheme fixes some and if a set of nodes for some SCC is at distance from/to due to an edge deletion in , they find a node separator of size ; removing from causes and to no longer be in the same SCC. We use this technique and remove the edges incident to the node separator from and therefore from . One subtle observation we want to stress at this point is that each node in the separator set appears also as a single-vertex node in the graph ; this is because each separator node for some is not melded with any other node in , as it has no edges in to or from any other node.
For some carefully chosen , we can maintain such that at most half the nodes in become separator nodes at any point of the algorithm. This follows since each separator set is small in comparison to the smaller side of the cut and since each node in can only be times on the smaller side of a cut.
Let us now refine our approach to maintain the ES in-trees and ES out-trees and introduce a crucial ingredient devised by Roditty and Zwick[roditty2008improved]. Instead of picking an arbitrary node from an SCC with , we are going to pick a vertex uniformly at random and run our ES in-tree and out-tree from the node on the graph . For each SCC we denote the randomly chosen root by . In order to improve the running time, we reuse ES-trees when the SCC is split into SCCs , where we assume wlog that , by removing the nodes in from and setting . Thus, we only need to initialize a new ES-tree for the SCC . Using this technique, we can show that each node is expected to participate in ES-trees over the entire course of the algorithm, since we expect that if a SCC
breaks into two parts then we have with constant probability that the chosen random source is in the larger part of the graph. Since the ES-trees work on induced graphs with disjoint node sets, we can therefore conclude that the total update time for all ES-trees is.
We point out that using the ES in-trees and out-trees to detect node separators as described above complicates the analysis of the technique by Roditty and Zwick[roditty2008improved] but a clever proof presented in [chechik2016decremental] shows that the technique can still be applied. In our paper, we present a proof that can even deal with some additional complications and that is slightly simpler.
A contrast to the algorithm of Chechik et al [chechik2016decremental]
Other than our hierarchy, the overview we have given so far largely comes from the algorithm of Chechik et al [chechik2016decremental]. However, their algorithm does not use a hierarchy of graphs. Instead, they show that for any graph , one can find (and maintain) a node separator of size such that all SCCs in have diameter at most . They can then use ES-trees with random sources to maintain the SCCs in in total update time . This leaves them with the task of computing how the vertices in might meld some of the SCCs in . They are able to do this in total update time by using an entirely different technique of [lkacki2013improved]. Setting , they achieve the optimal trade-off between the two techniques: total update time in expectation.
We achieve our (m) total update time by entirely avoiding the technique of [lkacki2013improved] for separately handling a small set of separator nodes, and instead using the graph hierarchy described above, where at each level we set to be polylog rather than .
We note that while our starting point is the same as [chechik2016decremental], using a hierarchy of separators forces us to take a different perspective on the function of a separator set. The reason is that it is simply not possible to ensure that at each level of the hierarchy, all SCCs have small diameter. To overcome this, we instead aim for separator sets that decompose the graph into SCCs that are small with respect to a different notion of distance. The rest of the overview briefly sketches this new perspective, while sweeping many additional technical challenges under the rug.
Refining the hierarchy.
So far, we only discussed how to maintain efficiently by deleting many edges from and hence ensuring that SCCs in have small diameter. To discuss our bottom-up approach, let us define our graphs more precisely.
We maintain a separator hierarchy where , with , for all levels . Each set is a set of single-vertex nodes – i.e. nodes of the form – that is monotonically increasing over time.
We can now more precisely define each edge set . To avoid clutter, we abuse notation slightly referring henceforth to simply as if is a set of singleton sets and the context is clear. We therefore obtain
In particular, note that contains all the edges of except those in ; as we move up to level , we add the edges incident to . Note that if , and our algorithm then adds to , this will remove all edges incident to from and add them to . Thus the fact that the sets used by the algorithm are monotonically increasing implies the desired property that edges only move up the hierarchy.
At a high-level, the idea of the hierarchy is as follows. Focusing on a level , when the “distances” in some SCC of get too large (for a notion of distance defined below), the algorithm will add a carefully chosen set of separator nodes in to . By definition of our hierarchy, this will remove the edges incident to the from , thus causing the SCCs of to decompose into smaller SCCs with more manageable “distances”. We note that our algorithm always maintains the invariant that nodes added to were previously in , which from the definition of our hierarchy, ensures that at all times the separator nodes in are single-vertex nodes in ; this is because the nodes of are the SCCs of , and contains no edges incident to .
For our algorithm, classic ES-trees are only useful to maintain SCCs in ; in order to handle levels we develop a new generalization of ES-trees that use a different notion of distance. This enables us to detect when SCCs are split in graphs and to find separator nodes in as discussed above more efficiently.
Our generalized ES-tree (GES-tree) can be seen as a combination of the classic ES-trees[shiloach1981line] and a data structure by Italiano[italiano1988finding] that maintains reachability from a distinguished source in a DAG, and which can be implemented in total update time .
Let be some feedback vertex set in a graph ; that is, every cycle in contains a vertex in . Then our GES-tree can maintain -distances and a corresponding shortest-path tree up to -distance from a distinguished source for some in the graph . (See Section 2 for the definition of -distances.) This data structure can be implemented to take total update time.
Maintaining the SCCs in .
Let us focus on maintaining SCCs in . Since the node set of a condensation is a directed acyclic graph, and is a set of edges that is incident to , we have that forms a feedback node set of . Now consider the scheme described in the paragraphs above, but instead of running an ES in-tree and out-tree from each center for some SCC , we run a GES in-tree and out-tree on that maintains the -distances to depth . Using this GES, whenever a set of nodes has -distance , we show that we can find a separator of size that only consists of nodes that are in ; we then add the elements of set to the set , and we also remove the nodes from the GES-tree, analogously to our discussion of regular ES-trees above. Note that adding to removes the edges from ; since we chose to be a separator, this causes and to no longer be part of the same SCC in . Thus, to maintain the hierarchy, we must then add new nodes to corresponding to the new SCC in and every single-vertex set in ( might not form a SCC but we then further decompose it after we handled the node insertion). This might cause some self-loops in to become edges between the newly inserted nodes and needs to be handled carefully to embed the new nodes in the GES-trees maintained upon the SCC in that is part of. This update might trigger further changes in the graph since it might increase the -distance between some nodes.
Thus, overall, we ensure that all SCCs in have -diameter at most , and can hence be efficiently maintained by GES-trees. In particular, we show that whenever an SCC exceeds diameter , we can, by moving a carefully chosen set of nodes in to , remove a corresponding set of edges in , which breaks the large--diameter SCC into SCCs of smaller -diameter.
Bounding the total update time.
Finally, let us sketch how to obtain the total expected running time . We already discussed how by using random sources in GES-trees (analogously to the same strategy for ES-trees), we ensure that each node is expected to be in GES-trees maintained to depth . Each such GES-tree is maintained in total update time , so we have total expected update time for each level, and since we have levels, we obtain total expected update time . We point out that we have not included the time to compute the separators in our running time analysis; indeed, computing separators efficiently is one of the major challenges to building our hierarchy. Since implementing these subprocedures efficiently is rather technical and cumbersome, we omit their description from the overview but refer to section 6 for a detailed discussion.
4 Generalized ES-trees
Even and Shiloach[shiloach1981line] devised a data structure commonly referred to as ES-trees that given a vertex in a graph undergoing edge deletions maintains the shortest-path tree from to depth in total update time such that the distance of any vertex can be obtained in constant time. Henzinger and King[henzinger1995fully] later observed that the ES-tree can be adapted to maintain the shortest-path tree in directed graphs.
For our algorithm, we devise a new version of the ES-trees that maintains the shortest-path tree with regard to -distances. We show that if is a feedback vertex set for , that is a set such that every cycle in contains at least one vertex in , then the data structure requires only total update time. Our fundamental idea is to combine classic ES-trees with techniques to maintain single-source reachability in DAGs which can be implemented in linear time in the number of edges[italiano1988finding]. Since and is a trivial feedback vertex set, we have that our data structure generalizes the classic ES-tree. Since the empty set is a feedback vertex set for DAGs, our data structure also matches the time complexity of Italiano’s data structure. We define the interface formally below.
Let be a graph and a feedback vertex set for , and . We define a generalized ES-tree (GES) to be a data structure that supports the following operations:
: Sets the parameters for our data structure. We initialize the data structure and return the GES.
: , if , reports , otherwise .
: , if , reports , otherwise .
: Sets .
: For , sets , i.e. removes the vertices in and all incident edges from the graph .
: Returns a vertex with or if no such vertex exists.
The GES as described in definition 4.1 can be implemented with total initialization and update time for all operations and and requires worst-case time for all other operations.
We defer the full prove to the appendix B but sketch the proof idea.
(sketch) Consider a classic ES-tree with each edge weight of an edge in set to and all other edges of weight . Then, the classic ES-tree analysis maintains with each vertex the distance level that expresses the current distance from to . We also have a shortest-path tree , where the path in the tree from to is of weight . Since is a shortest-path tree, we also have that for every edge , . Now, consider the deletion of an edge from that removes an edge that was in . To certify that the level does not have to be increased, we scan the in-going edges at and try to find an edge such that . On finding this edge, is added to . The problem is that if we a -weight cycle, the edge that we use to reconnect might actually come from a that was a descendant of in . This will break the algorithm, as it disconnects from in . But we show that this bad case cannot occur because is assumed to be a feedback vertex set, so at least one of the vertices on the cycle must be in and therefore the out-going edge of this vertex must have weight contradicting that there is a -weight cycle. The rest of the analysis follows closely the classic ES-tree analysis. ∎
To ease the description of our SCC algorithm, we tweak our GES implementation to work on the multi-graphs . We still root the GES at a vertex , but maintain the tree in at . The additional operations and their running time is described in the following lemma whose proof is straight-forward and therefore deferred to appendix C. Note that we now deal with nodes rather than vertices, but as discussed in the paragraph “Refining the hierarchy" in the overview, our hierarchy ensures that every separator node is always just a single-vertex node in . For this reason, we require in the lemma below.
Say we are given a partition of a universe and the graph , a feedback node set , a distinguished vertex , and a positive integer . Then, we can run a GES as in definition 4.1 on in time supporting the additional operations:
: the input is a set of vertices contained in node , such that either or is an empty set, which implies . We remove the node in and add node and to .
: This procedure adds the nodes in to the feedback vertex set . Formally, the input is a set of single-vertex sets . then adds every to .
We point out that we enforce the properties on the set in the operation in order to ensure that the set remains a feedback node set at all times.
5 Initializing the the graph hierarchy
We assume henceforth that the graph initially is strongly-connected. If the graph is not strongly-connected, we can run Tarjan’s algorithm[tarjan1972depth] in time to find the SCCs of and run our algorithm on each SCC separately.
Our procedure to initialize our data structure is presented in pseudo-code in algorithm 1. We first initialize the level where is simply with the vertex set mapped to the set of singletons of elements in . We then use information from level to construct level and to adapt graph . We start by invoking the procedure , which satisfies the following lemma, whose proof is deferred to the next section. (Intuitively, Split finds a set of separator nodes whose removal leads to all SCCs in having small -distance.
returns a tuple where is a partition of the node set with the following properties: 1) for any node set , and any nodes and , we have . 2) For any two node sets with , and for and , we have . The algorithm runs in time .
Using and , we first set to be the equal to . Note that as a result, the edges incident to are removed from . The partition then refers to the SCCs in ; the properties of the Split procedure guaranteed by Lemma 5.1 ensure that all SCCs in have -distance at most .
Then, we invoke the procedure , that is presented in algorithm 2. The procedure initializes for each that corresponds to an SCC in the GES-tree from a vertex chosen uniformly at random on the induced graph . Observe that we are not explicitly keeping track of the edge set but remove edges implicitly by only maintaining the induced subgraphs of that form SCCs. A small detail we want to point out is that each separator node also forms its own single-node set in the partition .
On returning to algorithm 1, we are left with initializing the graph . Therefore, we simply set to and use again all edges . Finally, we initialize to the empty set which remains unchanged throughout the entire course of the algorithm.
6 Finding Separators
Before we describe how to update the data structure after an edge deletion, we want to explain how to find good separators since it is crucial for our update procedure. We also discuss how to use these separators to obtain an efficient implementation of the procedure that we encountered in the initialization. For simplicity, we describe the separator procedures on simple graphs instead of our graphs ; it is easy to translate these procedures to our multi-graphs because the separator procedures are not dynamic; they are only ever invoked on a fixed graph, and so we do not have to worry about node splitting and the like.
To gain some intuition for the technical statement of our separator properties stated in lemma 6.1, consider that we are given a graph , a subset of the vertices , a vertex and a depth . Our goal is to find a subset of vertices of , such that there is no path from any vertex in to a vertex in in the graph . In our procedure, we always have the vertex contained in .
We also want our separator to be as small as possible. In this case, we want our separator to be small in comparison to the number of vertices in of each side of the induced cut. More formally, we say that should be small in comparison to . We say therefore that our separator is balanced.
To find a good separator, we start by computing a BFS at . Here, we assign edges in again weight and all other edges weight and say a layer consists of all vertices that are at same distance from . To find the first layer, we can use the graph and run a normal BFS from and all vertices reached form the first layer . We can then add for each edge with the vertex to if it is not already in . We can then contract all vertices visited so far to a single vertex and repeat the procedure described for the initial root . It is straight-forward to see that the vertices of a layer that are also in form a separator of the graph. To obtain a separator that is small in comparison to , we add each of the layers to one after another to our set , and output the index of the first layer that grows the set of -vertices in by factor less than . We then set to be the vertices in that are in layer . If the separator is not small in comparison to , we grow more layers and output the first index of a layer such that the separator is small in comparison to . This layer must exist and is also small in comparison to . Because we find our separator vertices using a BFS from , a useful property of our separator is that all the vertices in and are within bounded distance from .
Finally, we can ensure that the running time of the procedure is proportional to the set of vertices that were found by the BFS from root .
Lemma 6.1 (Balanced Separator).
There exists a procedure (analogously ) where we have some , a graph and a set and a positive integer as parameters. The procedure computes a tuple such that