Some General Structure for Extremal Sparsification Problems

01/21/2020 ∙ by Greg Bodwin, et al. ∙ 0

This paper is about a branch of theoretical computer science that studies how much graphs can be sparsified while faithfully preserving their properties. Examples include spanners, distance preservers, reachability preservers, etc. We introduce an abstraction that captures all of the above, and then we prove a couple simple structural lemmas about this abstraction. These imply unified proofs of some state-of-the-art results in the area, and they improve the size of Chechik's +4 additive spanner [SODA '13] from O(n^7/5) to O(n^7/5).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many problems in theoretical computer science ask for a sparsified version of an object, e.g. a graph or matrix, which approximately preserves certain properties of the original. Examples include spanners, spectral sparsifiers, distance preservers, reachability preservers, etc. These problems are often considered from an extremal standpoint, meaning the goal is to determine the tradeoff between the size of an input instance and the quality of sparsification that can be always be achieved for inputs of the given size. (The alternative is the algorithmic standpoint, where the goal is to develop an algorithm that outputs a high-quality sparsified version of any particular input instance.) The goal of this paper is to initiate the study of these extremal sparsification problems in general. We will next set up the main abstractions considered in this paper.

Definition 1 (Monotone Circuits).

A satisfiable monotone circuit with input wires and a single output wire is one with the following properties. We say satisfies if the output is true when inputs are set to true and to false.

  • (Monotone) If and satisfies , then also satisfies .

  • (Satisfiable) satisfies .

Definition 2 (Monotone Constraint Satisfaction Problems (MCSPs)).

A monotone constraint satisfaction problem (MCSP) is a finite set of satisfiable montone circuits (called constraints), all with the same number of input wires labeled with . We say that satisfies if it satisfies every .

Every MCSP is satisfiable (by ), but we will be interested in the optimization problem of satisfying with as few inputs as possible.

Definition 3 (Energy Cost).

The energy cost of an MCSP is the quantity

To discuss extremal problems, we need to speak of a family of MCSPs. We define these as follows:

Definition 4 (MCSP Complex).

An MCSP complex is a nonempty set of MCSPs closed under subsets; that is, if and , then .

Definition 5 (eec).

The extremal energy cost of an MCSP complex is the function

(The parameter is always a nonnegative integer.)

Before saying more about eec, let us make this all concrete by tying it to some previously studied problems in the literature. We start with (additive) spanners:

Definition 6 (Additive Pairwise Spanners [8, 7]).

Given a graph and a set of demand pairs , a subgraph is a pairwise spanner of if we have

The goal in spanners research is often to prove an extremal tradeoff between the number of nodes in the input graph , the number of demand pairs , the error budget , and the number of edges needed for a spanner. For example:

Theorem 1 ([12, 1]).

Any demand pairs in an -node undirected unweighted graph have a pairwise spanner on edges.

Let us see how this result can be cast in the above framework. Any given set of demand pairs in an -node graph can be viewed as an MCSP as follows. There are input wires corresponding to the possible edges of an -node graph. There are constraints, where each one encodes the condition for one demand pair . Here is the subgraph that has an edge iff the corresponding input wire is turned on and also that edge is present in . The energy cost of this MCSP is exactly the least possible number of edges in a pairwise spanner.

The relevant MCSP complex is then the set of the above MCSPs ranging over over all possible -node graphs and all possible sets of demand pairs. Theorem 1 is now exactly the statement that

The extremal behavior of the following two objects can be captured by eec in basically the same way, although we will not explicitly walk through the connection as we have for pairwise spanners.

Definition 7 (Distance Preservers [7]).

Given a graph and a set of demand pairs , a subgraph is a distance preserver of if we have

(Equivalently, it is a pairwise spanner.)

Definition 8 (Reachability Preservers [2]).

Given a (typically directed) graph and a set of demand pairs , a subgraph is a reachability preserver of if, for all , there is a directed path in if and only if there is one in .

This paper is driven by two new structural lemmas that control the general behavior of eec, which we next outline.

1.1 First Structural Lemma: Partial Constraint Satisfaction

Many popular constructions for the above objects leverage the hitting set technique, where one uses a random sample of nodes to inform the construction somehow. For example, in [3], a step in the spanner construction involves choosing a random set of nodes and adding a collection of BFS trees rooted at each . The hitting set technique often leads to clean and elementary constructions, making it preferable to alternatives when possible. However, it implicitly costs an “extra” factor: for an -node graph, given “important” node subsets of size each, one must sample

nodes to hit each subset with high probability, usually resulting in an extra

factor in the final size of the sparsifying subgraph . For many problems, hitting set arguments lag behind state-of-the-art only by this factor.

Our first structural lemma is essentially an observation that this extra factor can be generally avoided in eec. Formally:

Lemma 2 (First Structural Lemma).

Let be an MCSP complex, let be parameters, and let be an absolute constant. Suppose that for , there exist inputs satisfying either of the following two properties:

  1. satisfies at least a constant fraction of the constraints in and , or

  2. satisfies and .

Then

For context, this lemma would be trivial if its first property required that satisfies all of rather than just a constant fraction; the point is that we can relax to solve only a constant fraction of the constraints “for free.” In principle this lemma applies any time this relaxation is useful, but probably the most common reason why one might satisfy only a constant fraction of the constraints is due to application of a hitting set argument with the extra removed. Using this, we can shave the from the previously-mentioned hitting set arguments for spanners and friends, giving:

Theorem 3 (Informal).

The following (known) state-of-the-art theorems all have simple proofs based on the hitting set technique. Let be the number of nodes in the input graph and the number of demand pairs.

  • Every (possibly directed and weighted) graph has a distance preserver on edges. [7]

  • Every (possibly directed) graph has a reachability preserver on edges. [2]

  • Every undirected unweighted graph has a pairwise spanner on edges. [12, 1]

  • Every undirected unweighted graph has a pairwise spanner on edges. [11]

There are a few other related results in the above papers that can be similarly achieved by random sampling, but having made our point we will stop with these. However, we give one more application to spanners. Here the current state-of-the-art proofs use hitting sets [11, 6], so we do not really simplify them, but we shave their s.

Theorem 4.

Any demand pairs in any -node undirected unweighted graph has a pairwise spanner on edges.

This case may be more interesting than the others because it is not currently known how to shave this without passing through Lemma 2.

1.2 Second Structural Lemma: Constraint Saturation

Spanners are more commonly studied in the all-pairs setting, i.e. the special case of pairwise spanners where . For example, in [3] it is proved that all undirected unweighted -node graphs have an all-pairs spanner on edges. At first glance, this seems better than what one would expect from the state-of-the-art pairwise spanner: plugging into the bound of mentioned above, we get a much worse bound of edges for the all-pairs setting.111Not mentioned in this discussion: in [7] it is proved that one can have edges and error (i.e. a distance preserver), which beats these bounds in some regime.

Our second structural lemma explains this apparent discrepancy by showing that eec

bounds generally undergo a phase transition once

. We prove:

Lemma 5 (Second Structural Lemma).

For an MCSP complex , let

Then for all we have .

Put another way: as one considers larger and larger , the value of is generally increasing. But as soon as we hit a value of where , the value of eec freezes. We will refer to this parameter range where as the saturated regime.

The conceptual point of this lemma is that the hardness of all-pairs constraint satisfaction is always concentrated on a relatively small number of demand pairs. A few recent papers [1, 10] have done technical work to design lower bounds with this property, but Lemma 5 shows that this property can actually be obtained for free, in a black-box way. However, we remark that some of these papers (including [10]) go a bit further to obtain hardness concentrated on demand pairs between a small number of terminal nodes, and we have been unable to prove an extension of Lemma 5 that implies this stronger property in general. Such an extension would be very interesting and would likely imply significant progress for spanners, even if it requires additional natural axioms about the MCSP complex in question.

More concretely, Lemma 5 elucidates the relationship between pairwise and all-pairs spanners. Returning to the example of error discussed above, the pairwise spanner bound of enters the saturated regime when , and thus (with Lemma 5) it implies the bound of for the all-pairs setting. The pairwise spanner bound of implies the all-pairs bound of from [4] in the same way. Finally, by the same argument, Theorem 4 implies that the factors can be removed from the all-pairs spanner construction in [6]:

Theorem 6.

Every -node undirected unweighted graph has a all-pairs spanner on edges.

See Figure 1 for a depiction of the interaction between pairwise spanner bounds and Lemma 5.

number of demand pairs

edges in spanner

error

error

error

saturated regime
Figure 1: Bounds for the , , and pairwise spanners (axes drawn on a scale).

2 Structural Lemmas

In this section we will prove our two structural lemmas about eec.

First Structural Lemma.

Proof of Lemma 2.

Starting with , we iterate the following process. Find a set as in the lemma statement on . If the first case of the lemma occurs, then we have that satisfies at least a constant fraction of the constraints in . Remove the satisfied constraints from and call the remainder . Repeat until is empty or the second case occurs ( satisfies all remaining constraints). Since all constraints are monotone, the union of all sets then satisfies all the original constraints in . Since the second case contributes only a subset of size , it suffices to bound the sizes of the previous ’s arising from the first case. We compute:

fraction satisfied each round

which completes the proof. ∎

Second Structural Lemma.

Proof of Lemma 5.

Let with , and for any , define its marginal cost as the quantity

To prove the lemma, we run the following algorithm. Initialize and . While there is such that

that is, is unchanged when is deleted from , we delete from . Otherwise, if there is currently no with this property, then arbitrarily choose any constraint and move it into . Repeat this process until is empty. We make two observations about the final state of :

  • First, we have . This holds because initially we have

    each time we delete an by construction the quantity is unchanged, and each time we move to the quantity decreases by at least (note that moving a constraint from to cannot possibly increase ). Since always, we can only move total constraints into .

  • Second, we have . We prove this by writing the identity

    and then arguing that this quantity is invariant through the algorithm; initially it is clearly , and at the end it is clearly . To show invariance, there are two cases. In the case where we we move a constraint from to , we look at the left-hand side of the identity, and it is clear that is unchanged. When we delete a constraint , we look at the right-hand side of the identity; clearly is unchanged, and by construction we only delete when is unchanged as well.

Putting these two parts together, we conclude that . By definition of this means , and so we have

which proves the lemma. ∎

3 Applications to Graph Sparsification

Coppersmith and Elkin proved the following foundational result about distance preservers:

Theorem 7 ([7]).

Every (possibly directed and weighted) -node graph and set of demand pairs has a distance preserver on edges.

This theorem is currently state-of-the-art in the parameter range [5]. The authors gave a somewhat more involved proof, but they also pointed out that a simple proof based on hitting sets gives a nearly-equal upper bound of . With application of Lemma 2, we show that the factors in the hitting set argument can be avoided. Here and throughout, we assume that the reader has basic familiarity with the hitting set technique, and thus we will assert some basic probabilistic computations without proof. We will also sometimes refer to the shortest path between two nodes, meaning that ties have been broken somehow. We now (re-)prove Theorem 7. The following proof should be credited to [7]; we have simply inserted Lemma 2 in the appropriate place and slightly rebalanced parameters.

Proof of Theorem 7.

Applying Lemma 2 (with and since the second case will not be used) it suffices to show that, for any parameter , there is a subgraph on edges that satisfies a constant fraction of the demand pairs. We do so as follows. Let be a parameter to be chosen shortly. Say that a demand pair is “short” if it has a shortest path on edges, or “long” otherwise.

  • To handle the short pairs , simply add the edges of a shortest path to the preserver (cost in total).

  • To handle the long pairs , randomly sample a set of nodes by including each node independently with probability . Then, add to the preserver in- and out- shortest path trees rooted at each . This step costs edges (with high probability). We sample a node on the shortest path with at least constant probability, and in this event an shortest path is contained in the edges of the two shortest path trees rooted at . Thus the distance for each long pair is preserved with constant probability.

Letting denote the final preserver, setting we have

Since a constant fraction (at least) of the demand pairs are satisfied in expectation, the proof is complete. ∎

For reachability preservers, the following two results are shown in [2]:

Theorem 8 ([2]).
  1. Every set of demand pairs in an -node directed graph has a reachability preserver on edges.

  2. When for some node subset of size (or also in the symmetric case ), this bound improves to .

The two parts of this theorem are proved separately in [2], using related but somewhat complicated arguments. We show that the former actually follows simply from the latter, cutting the work in half:

Proof of Theorem 8.1, given Theorem 8.2.

We apply Lemma 2 (with ), and our goal is to show that the premises of Lemma 2 are satisfied. This time, we will need both cases of Lemma 2. First, in the range , we trivially have for some node subset of size . Hence, by Theorem 8.2, there is a subgraph on edges that satisfies all constraints. Next, suppose and our goal is to satisfy a constant fraction of the constraints. Like before, let be a parameter, and say that a demand pair is “short” if its shortest path (or any canonical choice of path will work here) has length , or “long” otherwise.

  • To handle the short pairs , add the edges of a path to the preserver (cost ).

  • To handle the long pairs , randomly sample a set of nodes by including each node independently with probability . Let denote the demand pairs whose shortest path intersects a node , and note that each long pair is in with at least constant probability. We then split each such pair into two pairs and add two reachability preservers via Theorem 8.2, to handle all pairs of the form and then all pairs of the form , for total cost

Letting denote the final preserver, which satisfies at least a constant fraction of the demand pairs in expectation, setting , by Lemma 2 we have

which completes the proof. ∎

We next turn to pairwise spanners. The following auxiliary lemma will be useful. Let us say that a -initialization of a graph is a subgraph obtained by arbitrarily choosing edges incident to each node in and including them in , or including all edges incident to a node of degree (this simplifying technique, which replaces the standard clustering step, was first used in [13]).

Lemma 9 (e.g. [13, 6] and others).

If is a -initialization of an undirected unweighted graph , and there is a shortest path in that is missing edges in , then there are total nodes adjacent in to any node in .

Proof.

Note that any node is adjacent to at most three nodes in , since otherwise there is a path of length (passing through ) between the first and last such node, which is shorter than the corresponding subpath in . Additionally, for each edge , there must be edges in incident to since we did not choose to add itself in the initialization. Thus, we have:

We now give some hitting-set-based pairwise spanner constructions. We will first prove:

Theorem 10 ([12, 1]).

Every set of demand pairs in an -node graph has a pairwise spanner on edges.

Kavitha and Varma [12] gave a simple near-proof of this theorem, based on random sampling, with an upper bound of . The logs were subsequently shaved in [1] with a more complicated argument. We show via Lemma 2 that the s can be removed from the original simpler approach. (The following proof framework is from [12].)

Proof of Theorem 10.

We will apply Lemma 2 with (the second case is not used). Let be parameters, and let the spanner be a -initialization of (cost ). Say that a demand pair is “short” if its shortest path is currently missing edges in the spanner, or “long” otherwise.

  • To handle the short pairs , add the missing edges of a shortest path to the spanner (cost ).

  • To handle the long pairs , randomly sample a set of nodes by including each node independently with probability . Add to the spanner a shortest path tree rooted at each (cost ). By Lemma 9 there are nodes adjacent to the shortest path, so with constant probability or higher, we sample a node adjacent to a node on this shortest path. In this event, we compute:

    triangle inequality
    shortest path tree at
    triangle inequality
    on shortest path.

Letting denote the final spanner, which satisfies at least a constant fraction of all demand pairs in expectation, setting and we have

which completes the proof. ∎

A similar story holds for the pairwise spanner. Kavitha [11] proved:

Theorem 11 ([11]).

Every set of demand pairs in an -node graph has a pairwise spanner on edges.

Kavitha also mentions that a bound of can be obtained by a simple hitting set argument by reduction to the following key lemma in the area, which has been repeatedly rediscovered:

Theorem 12 ([4, 14, 9, 8]).

For every -node undirected unweighted graph and set of demand pairs with the structure for some , there is a pairwise spanner of on edges.

(The proof of Theorem 12 is technically quite distinct from anything else mentioned in this paper, so we will not recap it here.) We observe that the log can be avoided directly in the hitting set argument, as usual.

Proof of Theorem 11.

We apply Lemma 2 with (second case not used). Let be parameters, and let the spanner be a -initialization of (cost ). A demand pair is “short” if the shortest path is missing edges in the spanner, or “long” otherwise.

  • To handle the short pairs , add the missing edges in its shortest path to the spanner (cost ).

  • To handle the long demand pairs , there are two steps. First, add the first and last missing edges of the shortest path to the spanner (cost ). Then, randomly sample a set by including each node with probability and using Theorem 12 add a pairwise spanner on demand pairs ; this costs

    edges. By Lemma 9 the added prefix and suffix of the shortest path each have adjacent nodes. Thus, with constant probability or higher, we sample such that is adjacent to in the added prefix and is adjacent to in the added suffix. In this event we can compute:

    triangle inequality
    pairwise spanner
    triange inequality
    added prefix/suffix
    triangle inequality
    on shortest path.

Letting denote the final spanner, which satisfies at least a constant fraction of all demand pairs in expectation, setting and we have

which completes the proof. ∎

Next we turn to the pairwise spanner.

Theorem 13.

Every set of demand pairs in an -node graph has a pairwise spanner on edges.

A bound of was proved by Kavitha [11], which uses hitting sets, so we are simply shaving the logs from this construction. (The following construction framework is from [11], although we have lightly changed a few details.)

Proof of Theorem 13.

We apply Lemma 2 with . Let be parameters, and let the spanner be a -initialization of (cost ). This time there are three cases: a demand pair is “short” if its shortest path is missing edges, it is “medium” if it is not short but its shortest path is missing edges, or it is “long” otherwise.

  • To handle the short pairs , add the missing edges of the shortest path to the spanner (cost ).

  • To handle the long pairs , randomly sample a set of nodes by including each node independently with probability , and add the edges of a BFS tree rooted at each to the spanner (cost ). By Lemma 9 there are nodes adjacent to the shortest path, so with constant probability or higher we sample a node adjacent to a node on this path. In this event, we compute:

    triangle inequality
    shortest path tree
    triangle inequality
    on a shortest path.
  • There are two steps to handle the medium pairs . First, add the first and last missing edges in the shortest path to a spanner (cost ). Then, randomly sample a set of nodes by including each node independently with probability . For each pair of nodes , check to see if there exist nodes adjacent to (respectively) in the current spanner such that the shortest path is missing edges. If so, then choose nodes with this property minimizing , and add all missing edges in the shortest path to the spanner. If no such nodes exist, then do nothing for this pair . This step costs

    edges in total. For a medium demand pair , by Lemma 9 there are nodes adjacent to the added prefix and suffix, so with constant probability or higher we sample nodes adjacent to nodes on the added prefix, suffix (respectively). In this event, note that there are missing edges on the shortest path, since are on the shortest path and is a medium pair.222A technical detail here is that this step requires that shortest path tiebreaking is performed consistently, i.e. the canonical shortest path is a subpath of the canonical shortest path. This is the only construction in this paper that requires consistency. Thus, when are considered in the construction, we will indeed add a new shortest path to the spanner (as opposed to the case where we do nothing). Letting be the endpoints of this added shortest path, we compute:

    on shortest path
    on added prefix, suffix
    triangle inequality
    shortest path added
    minimal
    on shortest path.

Letting denote the final spanner, which satisfies a constant fraction of all demand pairs, setting and we have

which completes the proof. ∎

We next port our results for pairwise spanners over to the all-pairs setting.

Theorem 14.

Any -node graph has a all-pairs additive spanner on edges.

Proof.

By Theorem 13, the tradeoff for pairwise spanners enters the saturated regime at some . It thus follows from Lemma 5 that edges suffice for a spanner of any number of demand pairs. ∎

One can argue identically to convert Theorems 10 and 11 to state-of-the-art all-pairs and additive spanners, as discussed in the introduction.

References

  • [1] Abboud, A., and Bodwin, G. Error amplification for pairwise spanner lower bounds. In Proceedings of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (2016), Society for Industrial and Applied Mathematics, pp. 841–854.
  • [2] Abboud, A., and Bodwin, G. Reachability preservers: New extremal bounds and approximation algorithms. In Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (2018), Society for Industrial and Applied Mathematics, pp. 1865–1883.
  • [3] Aingworth, D., Chekuri, C., Indyk, P., and Motwani, R.

    Fast estimation of diameter and shortest paths (without matrix multiplication).

    SIAM Journal on Computing 28, 4 (1999), 1167–1181.
  • [4] Baswana, S., Kavitha, T., Mehlhorn, K., and Pettie, S. Additive spanners and (, )-spanners. ACM Transactions on Algorithms (TALG) 7, 1 (2010), 5.
  • [5] Bodwin, G. Linear size distance preservers. In Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (2017), Society for Industrial and Applied Mathematics, pp. 600–615.
  • [6] Chechik, S. New additive spanners. In Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (2013), SIAM, pp. 498–512.
  • [7] Coppersmith, D., and Elkin, M. Sparse sourcewise and pairwise distance preservers. SIAM Journal on Discrete Mathematics 20, 2 (2006), 463–501.
  • [8] Cygan, M., Grandoni, F., and Kavitha, T. On Pairwise Spanners. In 30th International Symposium on Theoretical Aspects of Computer Science (STACS 2013) (Dagstuhl, Germany, 2013), N. Portier and T. Wilke, Eds., vol. 20 of Leibniz International Proceedings in Informatics (LIPIcs), Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, pp. 209–220.
  • [9] Elkin, M. Unpublished. Unpublished.
  • [10] Goranci, G., Henzinger, M., and Peng, P. Improved guarantees for vertex sparsification in planar graphs. In 25th Annual European Symposium on Algorithms (ESA 2017) (Dagstuhl, Germany, 2017), K. Pruhs and C. Sohler, Eds., vol. 87 of Leibniz International Proceedings in Informatics (LIPIcs), Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, pp. 44:1–44:14.
  • [11] Kavitha, T. New pairwise spanners. Theory of Computing Systems 61, 4 (2017), 1011–1036.
  • [12] Kavitha, T., and Varma, N. M. Small stretch pairwise spanners and approximate d-preservers. SIAM Journal on Discrete Mathematics 29, 4 (2015), 2239–2254.
  • [13] Knudsen, M. B. T. Additive spanners: A simple construction. In Scandinavian Workshop on Algorithm Theory (2014), Springer, pp. 277–281.
  • [14] Pettie, S. Low distortion spanners. ACM Transactions on Algorithms (TALG) 6, 1 (2009), 7.