A majority of real-world networks are very large in size, and a significant fraction of them are known to change rather rapidly [SMS17]. This has necessitated the study of efficient dynamic graph algorithms - algorithms which use the existing solution to quickly find an updated solution for the new graph. Due to the size of these graphs, it is imperative that each update can be processed in sub-linear time.
Data structures which efficiently maintain solutions to combinatorial optimization problems have shot into prominence over the last few decades[ST83, Fre85]. Many fundamental graph problems such as graph connectivity [HK99, HdLT01, KKM13], maximal and maximum matchings [GP13, BHI18a, BS16, BHN16, BHN17], maximum flows and minimum cuts [IS10, Tho07, GHS18, GHT18] have been shown to have efficient dynamic algorithms which only require sub-linear runtime per update. On the other hand, lower bounds exist for a number of these problems [AW14, HKNS15, AD16, AGG18, AKT19]. [Hen18] contains a comprehensive survey of many graph problems and their state-of-the-art dynamic algorithms.
In this paper, we consider the densest subgraph problem. The density of an undirected graph is defined as . The problem asks to find a a set such that
where is the set of all edges within . The densest subgraph problem has great theoretical relevance due to its close connection to fundamental graph problems such as network flow and bipartite matching111We describe this connection explicitly in Sections 2 and 3.1.. While near-linear time algorithms exist for finding matchings in graphs [MV80, GT91, DP14], the same cannot be said for flows on directed graphs [Mad11]. In this sense, the densest subgraph problem acts as an indicative middle ground, since it is both a specific instance of a flow problem [Gol84, BGM14], as well as a generalization of bipartite -matchings. Interestingly, the densest subgraph problem does allow near-linear time algorithms [BGM14].
In terms of dynamic algorithms, the state-of-the-art data structure for maintaining -approximate bipartite matchings takes time per update [GP13]. [BHI18b] maintain a constant factor approximation to the -matching problem in time. For flow-problems, algorithms which maintain a constant factor approximation in sublinear update time have proved to be elusive. As defined in [BHNT15], we say that an algorithm is a fully dynamic -approximation algorithm for the densest subgraph problem if it can process the following operations: (i) insert/delete an edge into/from the graph; (ii) query a -approximation to the maximum subgraph density of the graph. The current best result for this is a fully dynamic -approximation algorithm which runs in amortized time per update [BHNT15]. In this paper, we give the first fully dynamic -approximation algorithm which runs in worst-case time per update:
Given a graph with vertices and some , there exists a fully dynamic algorithm which maintains, with high probability, a -approximate solution to the densest subgraph problem using worst-case update time per edge insertion/deletion.
Moreover, at any point, the algorithm can output the corresponding approximate densest subgraph in time , where is the number of vertices in the output.
Remark 1 (Oblivious adversary).
Theorem 1.1 assumes that the dynamic updates made throughout the algorithm are by an oblivious adversary. In other words, the “adversary” making the updates to the graph does not have access to the random bits used in the algorithm.
Note, however, that the randomness in our algorithm appears entirely from a uniform sparsification routine used to ensure that the updates need polylogarithmic time even when the graph is dense. This means that for sparse graphs, the algorithm is completely deterministic. We expand on these details in Section 3.
We use a “dual” interpretation of the densest subgraph problem to gain insight on the optimality conditions, as in [Cha00, BGM14]. Specifically, we translate it into a problem of assigning edge loads to incident vertices so as to minimize the maximum load across vertices. Viewed another way, we want to orient edges in a directed graph so as to minimize the maximum in-degree of the graph. This view gives a local condition for near-optimality of the algorithm, which we then leverage to design a data structure to handle updates efficiently.
1.1 Background and related work
Goldberg [Gol84] gave the first polynomial-time algorithm to solve the densest subgraph problem by reducing it to instances of maximum flow. This was subsequently improved to use only instances, using parametric max-flow [GGT89]. Charikar [Cha00]
gave an exact linear programming formulation of the problem, while at the same time giving a simple greedy algorithm which gives a-approximate densest subgraph (also studied in [AITT96]). Despite the approximation factor, this algorithm is popular in practice [CHKZ03] due to its simplicity and due to the fact that it runs in linear time and space.
Obtaining fast algorithms for approximation factors below 2, however, has proved to be a harder task. One approach towards this is to sparsify the graph in a way that maintains subgraph densities [MTVV15, MPP15] within a factor of , and run the exact algorithm on the sparsifier. However, this algorithm still incurs a term of in the running time, causing it to be super-linear for sparse graphs. A second approach is via numerical methods to solve LPs approximately. Bahmani et al. [BGM14] gave a algorithm by bounding the width of the dual LP for this problem, and using the multiplicative weights update framework to find an -approximate solution.
In terms of dynamic and streaming algorithms for the densest subgraph problem, the first result is by Bahmani et al. [BKV12], where they modified Charikar’s greedy algorithm to give a -approximation using passes over the input. Using the same techniques as in the static case, Bahmani et al. [BGM14] obtained a -approximation algorithm that requires passes over the input. Subsequently, Bhattacharya et al. [BHNT15] developed a more nuanced data structure to enable a 1-pass streaming algorithm which finds a approximation. They also gave the first fully dynamic algorithm for the densest subgraph problem, which maintains a approximation algorithm using amortized time per update. Around the same time, Epasto et al. [ELS15] gave a fully dynamic approximation algorithm in amortized time per update, with the caveat that edge deletions can only be random. [SLNT12] maintain a approximate densest subgraph efficiently in the distributed CONGEST model.
Variants of the densest subgraph problem have also been studied in the literature. One such variant is the densest at-least- subgraph problem, which requires the additional constraint that the solution have at least vertices. This problem can be shown to be NP-hard [KS09], but admits a polynomial-time 2-approximation algorithm [And07, AC09]. However, there is evidence that this factor cannot be improved further [Man18]: it is NP-hard to do so assuming the Small Set Expansion Hypothesis222A computational complexity assumption closely related to the Unique Games Conjecture.. Other variants include directed densest subgraph ([KV99, Cha00, KS09]), and clique-densest subgraphs ([Tso15, MPP15]).
In addition to its theoretical importance, dense subgraph discovery is an important primitive for several real-world applications such as community detection [KRR99, New06, KNT06, DGP07, CS12], link spam detection [GKT05], story identification [AKS14], distance query indexing [CHKZ03, JXRF09, AIY13] and computational biology [HYH05, SHK10, RWLW13], to name a few. Due to its practical relevance, many related notions of subgraph density, such as -cores [Sei83], quasi-cliques [BHB08], --communities [MSST08] have been studied in the literature. [LRJA10, TL10, Tso14] contain several other applications of dense subgraphs and related problems.
An alternate approach towards a dynamic algorithm for the densest subgraph problem is to adapt the multiplicative weights update framework [AHK12] used to solve the densest subgraph problem in [BGM14] to allow for edge updates. This technique works in the incremental (only edge insertions) regime for bipartite matchings [Gup14], and can similarly be adapted to work in the purely decremental case for the densest subgraph problem to give an amortized runtime per update.
In Section 2, we formally define the problem, and describe the linear programming formulations for this problem. In Section 3, we show how to leverage the intuition from the dual program to develop a fully dynamic algorithm. We also show how to extract a subgraph from the data structure in Appendix A.
We represent any undirected graph as , where is the set of vertices in , is the set of edges in . For any subset of vertices , we denote using the subset of all edges within .
We define as the density of subgraph induced by in , i.e.,
The maximum subgraph density of , , is simply the maximum among all subgraph densities, i.e.,
2.1 LP formulation and dual
The following is a well-known LP formulation of the densest subgraph problem, introduced in [Cha00].
Associate each vertex with a variable , where signifies being included in . Similarly, for each edge, let denote whether or not it is in . Relaxing the variables to be real numbers, we get the following LP, which we denote by , whose optimal is known to be .
As in [BGM14], we take greater interest in the dual of the above problem. Let be the dual variable associated with the first constraints of the form , and let be associated with the last constraint. We get the following LP, which we denote by .
This LP can be visualized as follows. Each edge has a load of , which it wants to send to its end points: and such that the total load of each vertex is at most . The objective is to find the minimum for which such a load assignment is feasible.
For a fixed , the above formulation resembles a bipartite graph between edges and vertices. Then, the problem is similar to a bipartite -matching problem [BHI18b], where the demands on one side are at most , and the other side are at least .
From strong duality, we know that the optimal objective values of both linear programs are equal, i.e., exactly . Let be the objective of any feasible solution to . Similarly, let be the objective of any feasible solution to .
Then, by optimality of and weak duality,
3 Fully dynamic algorithm
In this section, we give a fully dynamic algorithm which maintains the maximum subgraph density of as it undergoes edge additions and deletions. The formal claim is in Theorem 1.1, which we restate.
3.1 Intuition and Overview
At a high level, our approach is to view the densest subgraph problem via its dual problem, i.e., the problem of “assigning” an edge fractionally to its endpoints (as we discuss in Section 2). We view this as a load distribution problem, where each vertex is assigned some load from all its incident edges. Then, the objective of the problem is simply to find an assignment such that the maximum vertex load is minimized. It is easy to verify that an optimal load assignment in the dual problem is achieved when no edge is able to reassign its load such that the maximum load on its two endpoints gets reduced. In other words, local optimality implies global optimality.
In fact, this property holds even for approximately optimal solutions. Specifically, any solution which satisfies a -additive approximation to local optimality guarantees an approximate global optimal solution with a multiplicative error of at most , where is a threshold parameter lower bounding the optimal solution . Here, a -additive approximation implies that for any edge, the maximum among its endpoint loads can only be reduced by a value less than by reassigning the edge. Hence, in cases where the optimal density is , setting gives an -approximate solution.
An advantage of using as the local approximation factor is that we can do away with fractional load assignments. One can always achieve -approximate local optimality by assigning each edge completely to one of its endpoints. Let us visualize such a load assignment via a directed graph, by orienting each edge towards the vertex to which it is assigned. Now, the load on every vertex is simply its in-degree (). Then, a -approximate local optimal solution is achieved by orienting each edge such that there is no edge with , because otherwise, we can flip the edge to achieve a better local solution. Let us call this a locally stable oriented graph.
This leaves the following challenges in extending this idea to a fully dynamic algorithm:
How can we maintain a locally stable oriented graph under insertion/deletion operations efficiently?
How do we ensure that the optimal density, or equivalently the maximum in-degree , is at least ?
To solve the first issue, we need to understand the climbing-edge phenomenon of a locally stable oriented graph. We call an edge a climbing-edge if . Now, consider inserting an edge into a locally stable oriented graph. Since ’s in-degree increases, it could potentially have an in-neighbor such that . To “fix” this, we flip the edge ; however, this causes ’s in-degree to increase, which we need to now fix. So, instead of fixing the “unstable” edge caused by the increase in ’s in-degree right away, we instead find a maximal chain of climbing edges into and flip all the edges in the chain. This way, the in-degrees of all vertices in the chain except the first and last one remain the same. Due to the maximality of the chain, the top of the chain has no climbing edges into it, and hence increasing its in-degree by will not break local stability. Then, we will show that it takes time to detect this climbing chain, which can be of length at most . Applying the same argument to deletion operation, we can conclude that each update operation incurs time. This climbing chain closely relates to the concept of augmenting paths in network flows [FF10] and matchings [MV80, DP14], which seems fitting, considering our intuition that densest subgraph relates closely to these problems.
This means that we not only want a lower bound on , but also an upper bound. We solve this issue by using uniform sparsifiers. Suppose we are guaranteed that at all times for some , we just need a sparsifier that scales down and (approximately) preserves the density of the graph between and
, so that we have both, an accurate output and a small amortized runtime. However, we have no such guarantee on the optimal density. Even if we were able to have such an estimate at all times, it would keep changing, requiring our sparsifier to change accordingly.
To work around this issue, we maintain a hierarchy of sparsifiers with different sampling probabilities such that there always exists a relevant sparsifier that projects the maximum in-degree into the range . Now, each time, we check those sparsifiers in a smart way to quickly locate the most relevant sparsifier and use it to yield our result.
The rest of this section is organized as follows. In 3.2, we prove our main structural result - a locally stable oriented graph automatically gives an approximate globally optimal solution. In 3.3, we provide details of the data structure which facilitates dynamic changes to the graph, and also show the application of a uniform sparsifier to preserve subgraph densities approximately. Finally, in 3.4 we combine the above parts to maintain sparsifiers efficiently, which gives an algorithm that proves Theorem 1.1.
3.2 Locally stable orientations
The problem of orienting graphs in a way that minimizes the maximum indegree (or equivalently outdegree) has been of independent interest [Ven04, AMOZ07, LLP09, AJM11, AMO11, AJMO12, AJMO15, AJMO16, BIM17, AJM18, AJM19]. Since, low-outdegree orientations prove to be an important tool in computing maximal matchings, many efforts were made towards maintaining such orientations dynamically [BF99, Kow07, HTZ14, KKPS14, BB17].
Here, we show in detail as to how a local 1-approximate solution to the graph orientation problem corresponds to a near-optimal solution to the densest subgraph problem. We outline our dynamic algorithm to maintain a locally stable graph orientation in Section 3.3, which follows the algorithm in [KKPS14] closely.
From Equation 1, we know that the optimal solution to gives the exact maximum subgraph density of , . Let us interpret the variables of as follows:
Every edge assigns itself fractionally to one of its two endpoints. and denote these fractional loads.
is the total load assigned to . We denote this using .
The objective is simply .
If there is any edge such that and . Then can transfer an infinitesimal amount of load from to while not increasing the objective.
Hence, there always exists an optimal solution where for any edge , .
Using this intuition, we write the approximate version of by providing a slack of to the above condition. We call this relaxed LP as .
Theorem 3.1 states that indeed provides an approximate value of .
Given an undirected graph with vertices, let denote any feasible solution to , and let . Then,
Any feasible solution of is also a feasible solution of , and so we have . Also, when , the first inequality holds by default. It remains to show the first inequality holds for .
Denote by the set of vertices with load at least , i.e.,
Let be some adjustable parameter we will fix later. We define to be the maximal integer such that for any ,
Notice that such a maximal integer always exists because there are finite number of vertices in and the size of grows exponentially. By the maximality of ,
Let denote the density of this set . In order to bound , we compute the total load on all vertices in . For any , the load on is given by
However, we know that
and hence we only need to count for . Summing over all vertices in , we get
We also have
Since is at most the maximum subgraph density , and from the definition of ,
where the last inequality comes from the fact that , which implies that .
Now, we can set our parameter to maximize the term on the RHS. By symmetry, the maximum is achieved when both terms in the product are equal and hence we set
Given an undirected graph with vertices, let denote any feasible solution to , and let . Then, if ,
From Theorem 3.1, we have
Using the fact that , we get
Given an undirected graph with vertices, let denote any feasible solution to , and let . Then, if for constant and ,
Since cannot be negative, . Now, consider the following two cases. If , we directly get . Otherwise, . By Theorem 3.1, we have
Combining this with the condition that
3.3 Data structure
In this section, we describe a simple data structure which facilitates basic operations on a directed graph. This will prove as a basis for efficiently maintaining a locally stable oriented graph, as described in 3.1. Our techniques are similar to those used in [KKPS14], in which the authors maintain low out-degrees as the graph undergoes dynamic updates. Since our goal is more relaxed than in [KKPS14], we obtain slightly better time bounds.
There exists a data structure
that maintains a directed graph with vertices,
and an integer vector
vertices, and an integer vector, and supports the following operations:
: insert a directed edge into ;
: delete edge from ;
: increment by ;
: decrement by ;
: flip edge ;
: return a vertex with among ’s out-neighbors if exists;
: return a vertex with among ’s in-neighbors if exists.
Moreover, it takes time for Insert, Delete, Flip and FindUp, and takes for FindDown, Inc and Dec, where is the maximum in-degree of .
Proof of Theorem 3.4.
We claim that DegDS (defined in Algorithm 1) satisfies the claims in Theorem 3.4. It is easy to verify that any Insert, Delete and Flip need time to update the associated information. Notice that in Algorithm 1, whenever the value of changes for some vertex , we visit in-neighbors of to inform the change of its in-degree and update their ONbrs information, each of which takes time. Hence, Inc and Dec take time. Similarly, FindUp in the algorithm takes time to query while FindDown visits in-neighbors so it takes time. ∎
3.4 Update algorithm
Since the runtime of our data structure depends on the maximum in-degree, we want to have a graph with bounded indegree, or equivalently, low maximum subgraph density. To achieve this, we use a uniform sparsifier of the graph, which approximately maintains the maximum subgraph density while considerably reducing the number of edges in the graph. This method is used to similar effect in [MPP15] and [MTVV15].
Suppose we are given an undirected graph with vertices, a density threshold parameter , an approximation factor and a constant . Then, we construct a random graph , referred as the uniform sparsifier of , by sampling each edge with probability where is a sampling parameter depending on . Then, with probability at least , we have:
3.4.2 Threshold Algorithm
In this section, we describe the update algorithm that maintains a feasible solution to for a particular graph , for which the value of is known to be within some threshold. We convert it to a directed graph, by assigning directions arbitrarily to each inserted edge. Recall that these directions signify an edge assigning its load to one of its endpoints. We abuse notation to denote this directed graph also using .
The algorithm makes use of the DegDS (defined in Algorithm 1) to maintain a locally stable orientation at all times. Whenever an edge is inserted or deleted, it fixes the in-degree gap by flipping a chain of edges so that all vertices maintain the same in-degree except one vertex at the end of the chain, whose in-degree change is guaranteed to not cause any unstable edges. Now, in any locally stable orientation, the maximum in-degree is an approximation to the maximum subgraph density of . Algorithm 4 contains the pseudocode describing this in detail.
We prove the correctness and the runtime guarantee of Algorithm 4 in the following lemmas.
ThresholdDS (defined in Algorithm 4) maintains a feasible solution to , as the output to , under any edge insertion/deletion operation in time per operation where is the maximum in-degree of at the instant any operation is called.
We call locally stable when for any edge in . It is easy to verify that the maximum in-degree of is associated with a feasible solution of when is has a locally stable orientation by the following conversion: assigning each edge to the vertex , i.e., , and setting the load of each vertex as the in-degree of . Now we only need to show that maintains a locally stable orientation after any function call. Since starts from an empty graph, we only need to show that if is locally stable, then it remains so after running a single update on it.
Consider the case when the update op is . Before the insertion, we find a chain of edges such that for each while no such edge into can be found in . Then we flip edge for each .
We can show that the only vertex that undergoes the in-degree change is after the flips and the insertion of edge . That’s why we update its in-degree value by calling at the end of the process where is . For any among ’s in-neighbors, . Hence, ’s new in-degree, , is going to be greater or equal to . In other words, the in-degree change of still keeps the orientation locally stable.
Assume that is the maximum in-degree of at the instant any operation is called. We know the length of the chain is at most . Hence, it only takes steps to traverse from down to while it takes time to call according to Theorem 3.4. Also, each edge in the chain will be flipped once and each flip costs time.
At the end of process, we increase ’s in-degree via which again takes time. Overall, takes worst-case time per insertion. Similarly, we can also show that maintains a locally stable orientation after processing and takes time in the worst-case. ∎
For any graph with vertices, error factor and a fixed parameter , (defined in Algorithm 4) outputs with the guarantee that
if , ;
if , .
From Lemma 3.6, we know the output is always a feasible solution to .
3.5 Main Proof
We are now well equipped to prove Theorem 1.1. However, let us first define some useful notation. Since all our analysis will be on the graph instance at the time it undergoes an update/query operation, we use to denote the graph at that instance.
Notice that Algorithm 3 duplicates any update operation times (line 11), which means that the graph we are essentially running the algorithm on is scaled by a factor of . This is essential since we want to maintain a lower threshold on the input graph density. For ease of representation, denote this scaled graph as . Then we have
From the time bounds in Lemma 3.6, we need to sparsify in such a way that the scaled maximum density is within to make update operations affordable and is also above to make the result reliable. Algorithm 3 creates sparsifiers with a range of sampling probabilities. We use to denote the graph obtained by sampling with probability . So there is always some above which the desired property is true. We call such graphs affordable.
Secondly, from Corollary 3.2, we need the scaled maximum density to be at least for the local optimum to give approximations. Again, there will always be some below which this property holds. We call such graphs accurate.
Our main aim in the algorithm is to find a sparsifier which is both affordable and accurate. To achieve this, we start from the highest possible , and work downwards until the graph becomes accurate. Notice that until that iteration, all graphs are affordable, and so we remain within the runtime bounds.
For the values of in which is not affordable, the algorithm stores a list of updates which it will only execute at a point when it becomes affordable. This list is called . However, there is still a slight complication. While performing the updates in , ’s maximum density could spike, making it not affordable for a fraction of the updates. To sidestep this problem, the algorithm first processes the deletion operations and only then the insertions.
We first prove a few lemmas to show that the algorithm supports the intuition offered above. Let denote the maximum index for which . Notice that is always at least because we scaled original graph up by a factor of by duplicating all the update operations times. Hence, we can always find such that . Then, we have the following lemma.
For any iteration index , with high probability.
With high probability, .
From the above lemma, we know that the variable ready will definitely be true at the end of iteration . Combining with Lemma 3.8, we know the index of the iteration when the query is answered (line 29) is either or , i.e., with high probability.
Since smaller values of give larger sparsifiers, the density estimates in these should be approximately optimal. Lemma 3.10 states this formally.
For any iteration index , with high probability,
On the other hand, larger values of give smaller maximum densities, making it affordable to run update algorithms on them.
For any , with high probability, .
Using the above lemmas, we can now prove our main result. The part about extracting a subgraph is shown in Appendix A.
Proof of Theorem 1.1.
Since , Lemma 3.10 implies that the estimator , obtained as the result from is an approximation to , which is equal to . Hence, the answer to query, , is an approximation to . This proves the accuracy of Theorem 1.1.
Notice that each edge operation op from OPs will be put into each with probability . In each