# Average Sensitivity of Graph Algorithms

In modern applications of graphs algorithms, where the graphs of interest are large and dynamic, it is unrealistic to assume that an input representation contains the full information of a graph being studied. Hence, it is desirable to use algorithms that, even when only a (large) subgraph is available, output solutions that are close to the solutions output when the whole graph is available. We formalize this idea by introducing the notion of average sensitivity of graph algorithms, which is the average earth mover's distance between the output distributions of an algorithm on a graph and its subgraph obtained by removing an edge, where the average is over the edges removed and the distance between two outputs is the Hamming distance. In this work, we initiate a systematic study of average sensitivity. After deriving basic properties of average sensitivity such as composability, we provide efficient approximation algorithms with low average sensitivities for concrete graph problems, including the minimum spanning forest problem, the global minimum cut problem, the maximum matching problem, and the minimum vertex cover problem. We also show that every algorithm for the 2-coloring problem has average sensitivity linear in the number of vertices. To show our algorithmic results, we establish and utilize the following fact; if the presence of a vertex or an edge in the solution output by an algorithm can be decided locally, then the algorithm has a low average sensitivity, allowing us to reuse the analyses of known sublinear-time algorithms.

## Authors

• 7 publications
• 33 publications
11/04/2021

### Average Sensitivity of Dynamic Programming

When processing data with uncertainty, it is desirable that the output o...
09/09/2020

### Sensitivity Analysis of the Maximum Matching Problem

We consider the sensitivity of algorithms for the maximum matching probl...
02/11/2022

### Privately Estimating Graph Parameters in Sublinear time

We initiate a systematic study of algorithms that are both differentiall...
08/18/2020

### Parameterized Complexity of Maximum Edge Colorable Subgraph

A graph H is p-edge colorable if there is a coloring ψ: E(H) →{1,2,…,p},...
02/20/2020

### Maximum Edge-Colorable Subgraph and Strong Triadic Closure Parameterized by Distance to Low-Degree Graphs

Given an undirected graph G and integers c and k, the Maximum Edge-Color...
10/17/2019

### A Deterministic Algorithm for Balanced Cut with Applications to Dynamic Connectivity, Flows, and Beyond

We consider the classical Minimum Balanced Cut problem: given a graph G,...
02/21/2019

### Local Computation Algorithms for Spanners

A graph spanner is a fundamental graph structure that faithfully preserv...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In modern applications of graphs algorithms, where the graphs of interest are large and dynamic, it is unrealistic to assume that an input representation contains the full information of a graph being studied. For example, consider a social network, where a vertex corresponds to a user of the social network service and an edge corresponds to a friendship relation. It is reasonable to assume that users do not always update new friendship relations on the social network service, and that sometimes they do not fully disclose their friendship relations because of security or privacy reasons. Hence, we can only obtain an approximation to the true social network . This brings out the need for algorithms that can extract information on by solving a problem on . Moreover, as the solutions output by a graph algorithm are often used in applications such as detecting communities [New04, New06], ranking nodes [PBMW99], and spreading influence [KKT03], the solutions output by an algorithm on should be close to those output on .

We assume that the input graph at hand is a randomly chosen (large) subgraph of an unknown true graph . We regard that a deterministic algorithm is stable when the Hamming distance is small, where and are outputs of on and , respectively. Here, outputs are typically vertex sets or edges sets. More specifically, for an integer and a function on graphs, we say that the -average sensitivity of a deterministic algorithm is at most if

 (1)

for every graph , where for an edge set is the subgraph obtained from by removing , and are sampled from uniformly at random. When , we say that the average sensitivity is at most . Informally, we say that algorithms with low (-)average sensitivity are averagely stable. Although we focus on graphs here, we note that our definition can also be extended to the study of combinatorial objects other than graphs such as strings and constraint satisfaction problems. Since average sensitivity does not care about the solution quality, an algorithm that outputs the same solution regardless of the input has the least possible average sensitivity, though it is definitely useless. Hence, the key question in a study of average sensitivity is to reveal the trade-off between solution quality and average sensitivity for various problems.

###### Example 1.1.

Consider the algorithm that, given a graph , outputs the set of vertices of degree at least . As removing an edge changes the degree of exactly two vertices, the sensitivity of this algorithm is at most .

###### Example 1.2.

Consider the - shortest path problem, where given a graph and two vertices , we are to output the set of edges in a shortest path from to . Since the length of a shortest path is always bounded by , where is the number of vertices, every deterministic algorithm has average sensitivity . Indeed, there exists a graph for which this trivial upper bound is tight. Think of a cycle of even length and two vertices in diametrically opposite positions. Consider an arbitrary deterministic algorithm , and assume that it outputs a path (of length ) among the two shortest paths from to

. With probability half, an edge in

is removed, and must output the other path (of length ) from to . Hence, the average sensitivity must be . In this sense, there is no deterministic algorithm with nontrivial average sensitivity for the - shortest path problem.

We also define average sensitivity of randomized algorithms. Abusing the notation, we regard as the distribution of the output of on , and let denote the earth mover’s distance between and , where the distance between two outputs is measured by the Hamming distance. Then, for an integer and a function on graphs, we say that the -average sensitivity of a randomized algorithm is at most if

 Ee1,…,ek∼E[dEM(A(G),A(G−{e1,…,ek}))]≤β(G), (2)

where are sampled from uniformly at random. When , again, we say that the average sensitivity is at most . Note that when the algorithm is deterministic, (2) matches the definition of average sensitivity for deterministic algorithms.

###### Remark 1.3.

The -average sensitivity of an algorithm with respect to the total variation distance can be defined as , where denotes the total variation distance. It is easy to observe that, if the -average sensitivity of an algorithm with respect to the total variation distance is at most , then its -average sensitivity is bounded by , where the is the maximum Hamming weight of a solution.

###### Example 1.4.

Randomness does not add any power to algorithms for the - shortest path problem. Think of the cycle graph given in Example 1.2, and suppose that a randomized algorithm outputs and with probability and , respectively. Then, the average sensitivity is .

### 1.1 Basic properties of average sensitivity

The definition of average sensitivity lends itself to many nice properties. In this section, we discuss some useful properties of average sensitivity that we use as building blocks in the design of our averagely stable algorithms. We denote by the (infinite) set consisting of all graphs. Given a graph and , we use as a shorthand for . We use and to denote the number of vertices and edges in the input graph, respectively.

##### k-average sensitivity from average sensitivity.

This is one of the most important properties of our definition of average sensitivity. It essentially says that bounding the average sensitivity of an algorithm with respect to removal of a single edge automatically gives a bound on the average sensitivity of that algorithm with respect to removal of multiple edges. In other words, it is enough to analyze the average sensitivity of an algorithm with respect to the removal of a single edge.

###### Theorem 1.5.

Let be an algorithm for a graph problem with average sensitivity given by . Then, for any integer , the algorithm has -average sensitivity at most .

In particular, if the average sensitivity is a nondecreasing function of the number of edges, the above theorem immediately implies that the -average sensitivity is at most times the average sensitivity.

##### Sequential composability.

It will be useful if we can sequentially apply averagely stable algorithms on the input to get a solution and the whole algorithm is again averagely stable. We show two different sequential composition theorems for average sensitivity.

###### Theorem 1.6 (Sequential composability).

Consider two randomized algorithms . Suppose that the average sensitivity of with respect to the total variation distance is and the average sensitivity of is for any . Let be a randomized algorithm obtained by composing and , that is, . Then, the average sensitivity of is , where denotes the maximum Hamming weight among those of solutions obtained by running on and .

Our second composition theorem is for the average sensitivity with respect to the total variation distance. This is also useful to analyze the average sensitivity with respect to the earth mover’s distance, as it can be bounded by the average sensitivity with respect to the total variation distance times the maximum Hamming weight of a solution, as in Remark 1.3.

###### Theorem 1.7 (Sequential composability w.r.t. the TV distance).

Consider randomized algorithms for . Suppose that, for each , the average sensitivity of is with respect to the total variation distance for every . Consider a sequence of computations . Let be a randomized algorithm that performs this sequence of computations on input and outputs . Then, the average sensitivity of is at most with respect to the total variation distance.

##### Parallel composability.

It is often the case that there are multiple algorithms that solve the same problem albeit with different average sensitivity guarantees. Such averagely stable algorithms can be composed by running them according to a distribution determined by the input graph. The advantage of such a composition, which we call a parallel composition, is that the average sensitivity of the resulting algorithm might be better than the component algorithms for all graphs.

###### Theorem 1.8 (Parallel composability).

Let be algorithms for a graph problem with average sensitivities , respectively. Let be an algorithm that, given a graph , runs with probability for , where . Let denote the maximum Hamming weight among those of solutions obtained by running on and . Then the average sensitivity of is at most .

In this paper, we use the above theorem extensively to combine algorithms with different average sensitivities.

### 1.2 Connection to sublinear-time algorithms

We show a relationship between the average sensitivity of a global algorithm and the query complexity of a local algorithm that simulates oracle access to the output of the global algorithm. Roughly speaking, we show, in Theorem 1.9, that the existence of a local algorithm that can answer queries about the solution produced by a global algorithm implies that the average sensitivity of is bounded by the query complexity of . We use Theorem 1.9 to prove the existence of averagely stable matching algorithms based on the sublinear-time matching algorithms due to Yoshida et al. [YYI12].

###### Theorem 1.9 (Locality implies low average sensitivity).

Consider a randomized algorithm for a graph problem, where the solutions are subsets of the set of edges of the input graph. Assume that there exists an oracle satisfying the following:

• when given access to a graph and query , the oracle generates a random string and outputs whether is contained in the solution obtained by running on with as its random string,

• the oracle makes at most queries to in expectation, where this expectation is taken over the random coins of and a uniformly random query .

Then, has average sensitivity at most . Moreover, this is also true for algorithms for graph problems, where the solutions are subsets of the set of vertices of the input graph, whenever .

Theorem 1.9 cements the intuition that strong locality guarantees for solutions output by an algorithm imply that the removal of edges from a graph affects only the presence of a few edges in the solution, which in turn implies low average sensitivity. As an indirect method to bound the average sensitivity of algorithms, we think that Theorem 1.9 could lead to further research in the design of local algorithms for various graph problems.

### 1.3 Averagely stable algorithms for concrete problems

We summarize, in Table 1, the average sensitivity bounds that we obtain for various concrete problems. All our algorithms run in polynomial time, and the bounds on -average sensitivity of these algorithms can be easily derived using Theorem 1.5. Henceforth, let , , denote the number of vertices, the number of edges, and the optimal value. To help interpret our bounds on average sensitivity, we mention that for maximization problems whose optimal values are sufficiently Lipschitz with respect to edge removals, is a trivial upper bound for the average sensitivity. However, this is not the case in general for minimization problems.

For the minimum spanning forest problem, we show that Kruskal’s algorithm [Kru56] achieves average sensitivity , which is quite small regarding that the spanning forest can have edges. In contrast, it is not hard to show that the average sensitivities of the known polynomial-time (approximation) algorithms for the other problems listed in Table 1 are all .

For the global minimum cut problem, our algorithm outputs a cut as a vertex set. As the approximation ratio of our algorithm is constant, it is likely to output a cut of size close to , and hence we want to make its average sensitivity smaller than . We observe that the average sensitivity becomes smaller than when for , and it quickly decreases as increases.

For the maximum matching problem, we propose two algorithms. The first one has approximation ratio and average sensitivity , which is much smaller than the trivial . The second one has approximation ratio and average sensitivity for every constant , which shows that we do not have to sacrifice the approximation ratio a lot to obtain a non-trivial average sensitivity.

For the minimum vertex cover problem, we propose two algorithms. The first algorithm has approximation ratio , which is close to the best we can hope for as obtaining -approximation is NP-Hard assuming the Unique Games conjecture [KR03]. Moreover, the average sensitivity of is much smaller than the trivial . The second algorithm has a worse approximation ratio but can achieve a better average sensitivity in some regimes. For example, when , and , the average sensitivity of the first algorithm is whereas that of the second algorithm is .

In the 2-coloring problem, given a bipartite graph, we are to output one part in the bipartition. For this problem, we show a lower bound of in the average sensitivity, that is, there is no algorithm with non-trivial average sensitivity.

### 1.4 Discussions

##### Output representation.

The notion of average sensitivity is dependent on the output representation. For example, we can double the average sensitivity by duplicating the output. A natural idea for alleviating this issue is to normalize the average sensitivity by the maximum Hamming weight of a solution. However, for minimization problems where the optimal value could be much smaller than , such a normalization can diminish subtle differences in average sensitivity, e.g., vs . It is an interesting open question whether there is a canonical way to normalize average sensitivity so that the resulting quantity is independent of the output representation.

##### Sensitivity against adversarial edge removals.

It is also natural to take the maximum, instead of the average, over edges in definitions (1) and (2), which can be seen as sensitivity against adversarial edge removals. Indeed a similar notion has been proposed to study algorithms for geometric problems [MSVW18]. However, in our context, it seems hard to guarantee that the output of an algorithm does not change much after removing an arbitrary edge. Moreover, by a standard averaging argument, one can say that for 99% of arbitrary edge removals, the sensitivity of an algorithm is asymptotically equal to the average sensitivity, which is sufficient for most applications.

##### Average sensitivity against edge additions.

As another variant of average sensitivity, it is natural to consider incorporating edge additions in definitions (1) and (2). If an algorithm is stable against edge additions, then in addition to the case of not knowing the true graph as we have discussed earlier, it will be useful for the case that the graph dynamically changes but we want to prevent the output of the algorithm from fluctuating too much. However, in contrast to removing edges, it is not always clear how we should add edges to the graph in definitions (1) and (2). A naive idea is sampling pairs of vertices uniformly at random and adding edges between them. This procedure makes the graph close to a graph sampled from the Erdős-Rényi model [ER59], which does not well-represent real networks such as social networks and road networks. To avoid this subtle issue, in this work, we focus on removing edges.

##### Alternative notion of average sensitivity for randomized algorithms.

Consider a randomized algorithm that, given a graph on vertices, generates a random string for some function , and then runs a deterministic algorithm on , where the algorithm has hardwired into it. Let us assume that can be applied to any graph. It is also natural to define the average sensitivity of as

 (3)

In other words, we measure the expected distance between the outputs of on and when we feed the same string to , over the choice of and edge . Note that (3) upper bounds (2) because, in the definition of the earth mover’s distance, we optimally transport probability mass from to whereas, in (3), how the probability mass is transported is not necessarily optimal.

We can actually bound (3) for some of our algorithms. In this work, however, we focus on the definition (2) because the assumption that can be applied to any graph does not hold in general, and bounding (3) is unnecessarily tedious and is not very enlightening.

### 1.5 Related work

##### Average sensitivity of network centralities.

(Network) centrality is a collective name for indicators that measure importance of vertices or edges in a network. Notable examples are closeness centrality [Bav50, Bea65, Sab66], harmonic centrality [ML00], betweenness centrality [Fre77], and PageRank [PBMW99]. To compare these centralities qualitatively, Murai and Yoshida [MY19] recently introduced the notion of average-case sensitivity for centralities. Fix a vertex centrality measure ; let denote the centrality of a vertex in a graph . Then, the average-case sensitivity of on is defined as

 Sc(G)=Ee∼EEv∼V|cG−e(v)−cG(v)|cG(v),

where and are sampled uniformly at random. They showed various upper and lower bounds for centralities. See [MY19] for details.

Since a centrality measure assigns real values to vertices, they studied the relative change of the centrality values upon removal of random edges. As our focus in this work is on graph algorithms, our notion (2) measures the Hamming distance between solutions when one removes random edges.

##### Differential privacy.

Differential privacy [DMNS06] is a notion closely related to average sensitivity. It considers a neighbor relation over inputs and asks that the distributions of outputs on neighboring inputs are similar. The variant of differential privacy closest to our definition of average sensitivity is edge differential privacy introduced by Nissim et al. [NRS07] and further studied by [HLMJ09, GLM10, KS12, KNRS13, KRSY14, RS16]. Here, the neighbors of a graph are defined to be . Then for , we say that an algorithm is -differentially private if for all ,

 exp(−ε)⋅Pr[A(G−e)∈S]≤Pr[A(G)∈S]≤exp(ε)⋅Pr[A(G−e)∈S] (4)

for any set of solutions .

As differential privacy imposes the constraint (4) for every , the requirement is sometimes too strong for graph problems. For example, for the minimum vertex cover problem, (4) implies that we must output a vertex cover for even for , and it follows that we can only output a vertex cover of size at least . To avoid this issue, Gupta et al. [GLM10] considered an implicit representation of a vertex cover.

Moreover, since differential privacy guarantees that the probabilities of outputting a specific solution on and are close to each other, the total variation distance between the two distributions and must be small. Since the earth mover’s distance between two output distributions can be small even if the total variation distance between them is large, even if an algorithm does not satisfy the conditions of differential privacy, it could still have small average sensitivity. We would like to add that, despite these differences, our algorithms for the global minimum cut problem and the vertex cover problem are inspired by differentially private algorithms for the same problems [GLM10].

##### Generalization and stability of learning algorithms.

Generalization [SSBD09]

is a fundamental concept in statistical learning theory. Given samples

from an unknown true distribution over a dataset, the goal of a learning algorithm is to output a parameter that minimizes expected loss , where is the loss incurred by a sample with respect to a parameter . As the true distribution is unknown, a frequently used approach in learning is to compute a parameter that minimizes the empirical loss

, which is an unbiased estimator of the expected loss and is purely a function of the available samples. The

generalization error of a learner is a measure of how close the empirical loss is to the expected loss as a function of the sample size .

One technique to reduce the generalization error is to add a regularization

term to the loss function being minimized

[BE02]. This also ensures that the learned parameter does not change much with respect to minor changes in the samples being used for learning. Therefore, in a sense, learning algorithms that use regularization can be considered as being stable according to our definition of sensitivity.

Bousquet and Elisseeff [BE02] defined a notion of stability for learning algorithms and explored its connection to the generalization error. Their stability notion requires that the empirical loss of the learning algorithm does not change much by removing or replacing any sample in the input data. In contrast, average sensitivity considers removing random edges from a graph. Also, average sensitivity considers the change of the output solution rather than that of the objective value.

### 1.6 Overview of our techniques

##### Minimum spanning forest.

For the minimum spanning forest problem, we show that the classical Kruskal’s algorithm has low average sensitivity; it is always at most . Interestingly, Kruskal’s algorithm is deterministic and yet has low average sensitivity. In contrast, we show that Prim’s algorithm can have average sensitivity for a natural rule of breaking ties among edges.

##### Global minimum cut.

For the global minimum cut problem, our algorithm is inspired by a differentially private algorithm due to Gupta et al. [GLM10]. Our algorithm, given a parameter and a graph as input, first enumerates a list of cuts whose sizes are at most ; this enumeration can be done in polynomial time as shown by Karger’s theorem [Kar93]. It then outputs a cut from the list with probability inversely proportional to the exponential of the product of the size of the cut and . The main argument in analyzing the average sensitivity of the algorithm is that the aforementioned distribution is very close (in earth mover’s distance) to a related Gibbs distribution on the set of all cuts in the graph. Therefore the average sensitivity of the algorithm is of the same order as that of the average sensitivity of sampling a cut from such a Gibbs distribution doing which requires exponential time. We finally show that the average sensitivity of sampling a cut from this Gibbs distribution is at most .

##### Maximum matching.

There are several components to the design and analysis of our averagely stable -approximation algorithm for the maximum matching problem. Our starting point is the observation (Theorem 1.9) that the ability to locally simulate access to the solution of an algorithm implies that is averagely stable. We use this to bound the average sensitivity of a randomized greedy -approximation algorithm for the maximum matching problem. Specifically, constructs a maximal matching by iterating over edges in the input graph according to a uniformly random ordering and adding an edge to the current matching if the addition does not violate the matching property. Yoshida et al. [YYI12] constructed a local algorithm that, given a uniformly random edge as input, makes queries to in expectation and answers whether is in the matching output by on , where the expectation is over the choice of input and the randomness in , and is the maximum degree of . Combined with Theorem 1.9, this implies that the average sensitivity of is .

Next, we transform to also work for graphs of unbounded degree as follows. The idea is to remove vertices of degree at least from the graph and run on the resulting graph. This transformation affects the approximation guarantee only by an additive term as the number of such high degree vertices is small. However, this thresholding procedure could in itself have high average sensitivity, since the thresholds of and are different for any .

We circumvent this issue by using a Laplace random variable

as the threshold, where the distribution of is tightly concentrated around . We use our sequential composition theorem (Theorem 1.6) in order to analyze the average sensitivity of the resulting procedure, where we consider the instantiation of the Laplace random threshold as the first algorithm and the remaining steps in the procedure as the second algorithm. The first term in the expression given by Theorem 1.6 turns out to be a negligible quantity and is easy to bound. The main task in bounding the second term is to bound, for all , the average sensitivity of a procedure that, on input a graph , removes all vertices of degree at least from and runs the randomized greedy maximal matching algorithm. The heart of the argument in bounding this average sensitivity is that given a local algorithm with query complexity that simulates oracle access to the solutions output by an algorithm , we can, for all , construct a local algorithm for the algorithm . Moreover, the query complexity of , which also bounds the average sensitivity of by Theorem 1.9, is at most . This implies that the second term in the expression given by Theorem 1.6, which is given by , is .

An issue with the aforementioned matching algorithm is that its average sensitivity is poor for graphs with small values of . However, we observe that the algorithm that simply outputs the lexicographically smallest maximum matching does not have this issue. Its average sensitivity is , since the output matching stays the same unless an edge in the matching is removed. We obtain our final averagely stable

-approximation algorithm for the maximum matching problem by running these two algorithms according to a probability distribution determined by the input graph. Using our parallel composition theorem, we bound the sensitivity of the resultant algorithm as

.

The design and analysis of our averagely stable -approximation algorithm for the maximum matching problem uses similar ideas as above. The only difference is that we replace the randomized greedy maximal matching algorithm above with a -approximation algorithm that repeatedly improves a matching using greedily chosen augmenting paths.

##### Minimum vertex cover.

We describe two averagely stable algorithms for the minimum vertex cover problem. Our -approximation algorithm is based on a reduction from the averagely stable -approximation algorithm for the maximum matching problem. In particular, it runs the averagely stable matching algorithm and outputs a union of the set of vertices removed (by thresholding) and the set of endpoints of the matching computed. For the approximation guarantee, we argue that, with high probability, the cardinality of the set of removed vertices is . The main task in showing that the algorithm is averagely stable is to bound the average sensitivity of outputting the set of removed vertices. In case the same value of threshold is used for and , the cardinality of symmetric difference between the sets of removed vertices is at most . Using this observation and the ideas used in bounding the average sensitivity of our matching algorithms, we can bound the average sensitivity of outputting the set of removed vertices.

Our second algorithm for vertex cover is based on a differentially private vertex cover approximation algorithm due to Gupta et al. [GLM10]. Specifically, we output a permutation of the vertices and for each edge, its first endpoint in the permutation is in the vertex cover. If we generate our permutation by repeatedly sampling vertices according to their yet uncovered degree, we get a -approximation algorithm for vertex cover [Pit85]. If we instead output a uniformly random permutation of vertices, we get an algorithm with good average sensitivity but poor approximation guarantee. Our algorithm finds a middle ground between these approaches, by selecting vertices with probability proportional to their uncovered

degrees in the beginning and progressively skewing towards the uniform distribution.

##### 2-coloring.

To show our lower bound of average sensitivity for -coloring, consider the set of all paths on vertices and the set of all graphs obtained by removing exactly one edge from these paths (called -paths). A path has exactly two ways of being -colored and a -path has four ways of being -colored. A path and -path are neighbors if the latter is obtained from the former by removing an edge. A -path has at most four neighbors. The output distribution of any -coloring algorithm on a -path can be close (in earth mover’s distance) only to those of at most of its neighboring paths. If however has low average sensitivity, the output distributions of has to be close on a large fraction of pairs of neighboring graphs, which gives a contradiction.

### 1.7 Notation

For a positive integer , let . Let be a graph. For an edge , we denote by the graph obtained by removing from . Similarly, for an edge set , we denote by the graph obtained by removing every edge in from . For an edge set , let denote the set of vertices incident to an edge in . For a vertex set , let be the subgraph of induced by . We often use the symbols , , to denote the number of vertices, the number of edges, and the maximum degree of a vertex, respectively, in the input graph. We use to denote the optimal value of a graph in the graph problem we are concerned with. We simply write when is clear from the context. We denote by the (infinite) set consisting of all graphs.

### 1.8 Organization

We show our averagely stable algorithms for the minimum spanning forest problem, the global minimum cut problem, the maximum matching problem, and the vertex cover problems in Sections 234, and 5, respectively. Then, we show a linear lower bound for the 2-coloring problem in Section 6. We discuss general properties of average sensitivity in Section 7.

## 2 Warm Up: Minimum Spanning Forest

To get intution about average sensitivity of algorithms, we start with the minimum spanning forest problem. In this problem, we are given a weighted graph , where is a weight function on edges, and we want to find a forest of the minimum total weight including all the vertices.

Recall that Kruskal’s algorithm [Kru56] works as follows: Iterate over edges in the order of increasing weights, where we break ties arbitrarily. At each iteration, add the current edge to the solution if it does not form a cycle with the edges already added. The following theorem states that this simple and deterministic algorithm is averagely stable.

###### Theorem 2.1.

The average sensitivity of Kruskal’s algorithm is .

###### Proof.

Let be the input graph and be the spanning forest obtained by running Kruskal’s algorithm on . We consider how the output changes when we remove an edge from .

If the edge does not belong to , clearly the output of Kruskal’s algorithm on is also .

Suppose that the edge belongs to . Let and be the two trees rooted at the endpoints of obtained by removing from . If is not connected, that is, is a bridge in , then Kruskal’s algorithm outputs on . If is connected, then let be the first edge considered by Kruskal’s algorithm among all the edges connecting and , where is the vertex set of for . Then, Kruskal’s algorithm outputs on . It follows that the Hamming distance between and the output of the algorithm on is at most .

Therefore, the average sensitivity of Kruskal’s algorithm is at most

 m−|T|m⋅0+|T|m⋅2=O(nm).\qed

In Appendix A, we show that Prim’s algorithm, another classical algorithm for the minimum spanning forest problem, have average sensitivity for a certain natural tie breaking rule.

## 3 Global Minimum Cut

For a graph and a vertex set , we define to be the number of edges in that cross the cut . Then in the global minimum cut problem, given a graph , we want to compute a vertex set that minimizes . In this section, we show an algorithm with low average sensitivity for computing the global minimum cut problem in undirected graphs. Specifically, we show the following.

###### Theorem 3.1.

For , there exists a polynomial-time algorithm for the global minimum cut problem with approximation ratio and average sensitivity .

Let be the minimum size of a cut in . Our algorithm enumerates cuts of small size and then output a vertex set with probability for a suitable . See Algorithm 1 for details.

The approximation ratio of the Algorithm 1 is : It clearly holds when , and it also holds when because we only output a cut of size zero (for ). The following theorem due to Karger [Kar93] directly implies that it runs in time polynomial in the input size for any constant .

###### Theorem 3.2 ([Kar93]).

Given a graph on vertices with the minimum cut size and a parameter , the number of cuts of size at most is at most and can be enumerated in time polynomial (in ) per cut.

We now show that Algorithm 1 is averagely stable.

###### Lemma 3.3.

The average sensitivity of Algorithm 1 is at most

 β(G)=nm⋅n(2+1/ε)/OPT⋅((2+7ε)OPT+2ε)+o(1).

As we have , the average sensitivity can be bounded by , and Theorem 3.1 follows by replacing with .

###### Proof.

If , then the claim trivially holds because the right hand size is infinity. Hence in what follows, we assume .

Let denote Algorithm 1. Consider an (inefficient) algorithm that on input , outputs a cut (from among all the cuts in ) with probability proportional to . For a graph , let and denote the output distribution of algorithms and on input , respectively. For and , let and be shorthands for the probabilities that is output on input by algorithms and , respectively.

We first bound the earth mover’s distance between and for a graph . To this end, we define

 Z=∑S⊆V:cost(G,S)≤OPT+bexp(−α⋅cost(G,S)), and Z′=∑S⊆Vexp(−α⋅cost(G,S))

where . Note that and the quantity is the total probability mass assigned by algorithm to cuts such that .

Now, we start with . For each such that , keep at least mass with a cost of and move a mass of at most at a cost of . For each such that , we move a mass of at a cost of . The total cost of moving masses is then equal to:

 dEM(A(G),A′(G)) ≤n⋅∑S⊆V:cost(G,S)≤OPT+bp′G(S)(1−ZZ′)+n⋅∑S⊆V:cost(G,S)>OPT+bp′G(S) =n(Z′−Z)Z′⎛⎝∑S⊆V:cost(G,S)≤OPT+bp′G(S)(1−ZZ′)+1⎞⎠ ≤2n(Z′−Z)Z′.

Let stand for the number of cuts of cost at most in . By Karger’s theorem (Theorem 3.2), we have that . Then, we have

 Z′−ZZ′ ≤∑t>bexp(−αt)⋅(nt−nt−1)≤(exp(α)−1)⋅∑t>bexp(−αt)nt ≤(exp(α)−1)n2⋅∑t>bn2t/OPT⋅exp(−αt) ≤(exp(α)−1)n2⋅∑t>bn−t/εOPT≤(exp(α)−1)n2⋅n−(b+1)/εOPT1−n−1/εOPT =(n(2+1/ε)/OPT−1)⋅(1+1n1/εOPT−1)⋅n2n(b+1)/εOPT ≤n(2+1/ε)/OPT⋅(1+εnlogn)⋅n2n(b+1)/εOPT =O(εn3+(2+1/ε)/OPTn(b+1)/εOPT)=O(εn4+1/ε).

The last inequality above follows from our choice of . Therefore, the earth mover’s distance between and is . In addition, we can bound the expected size of the cut output by on as .

We now bound the earth mover’ distance between and for an arbitrary edge . Let denote the quantity . Since the cost of every cut in is at most the cost of the same cut in , we have that and therefore,

 p′G(S)=exp(−α⋅cost(G,S))Z′≤exp(α⋅cost(G−e,S))Z′e⋅Z′eZ′=p′G−e(S)⋅Z′eZ′.

We transform into as follows. For each , we leave a probability mass of at most at with zero cost and move a mass of to any other point at a cost of at most . Hence,

 dEM(A′(G),A′(G−e))≤n⋅(Z′eZ′−1)⋅∑S⊆Vp′G(S)=n⋅(Z′eZ′−1).

By the triangle inequality, the earth mover’s distance between and can be bounded as

 dEM(A(G),A(G−e)) ≤dEM(A(G),A′(G))+dEM(A′(G),A′(G−e))+dEM(A′(G−e),A(G−e)) ≤n⋅(Z′eZ′−1)+O(2εn2+1/ε).

Hence, the average sensitivity of is bounded as:

 β(G) =Ee∈EdEM(A(G),A(G−e))≤O(2εn3+1/ε)+n⋅Ee∈E(Z′eZ′−1) =O(2εn3+1/ε)+nmZ′∑e∈E(Z′e−Z′) =O(2εn3+1/ε)+nmZ′∑e∈E∑S⊆V:e crosses Sexp(−α⋅cost(G−e,S))−exp(−α⋅cost(G,S)) =O(2εn3+1/ε)+n(exp(α)−1)mZ′∑e∈E∑S⊆V:e crosses Sexp(−α⋅cost(G,S)) =O(2εn3+1/ε)+n(exp(α)−1)m∑S⊆Vcost(G,S)⋅exp(−α⋅cost(G,S))Z′.

The summation in the second term above is equal to the expected size of the cut output by algorithm on input . We argued that it is at most . Hence, the average sensitivity of is at most

 nm⋅n(2+1/ε)/OPT⋅((2+7ε)OPT+2ε)+O(εn(2+1/ε)/OPT+2n3+1/ε) =nm⋅n(2+1/ε)/OPT⋅((2+7ε)OPT+2ε)+o(1)

as . ∎

## 4 Maximum Matching

A vertex-disjoint set of edges is called a matching. In the maximum matching problem, given a graph, we want to find a matching of the maximum size. In this section, we describe different algorithms with low average sensitivity that approximate the maximum matching in a graph.

### 4.1 Lexicographically smallest matching

In this section, we describe an algorithm that computes a maximum matching in a graph with average sensitivity at most and prove Theorem 4.1, where is the maximum size of a matching.

First, we define some ordering among vertex pairs. Then, we can naturally define the lexicographical order among matchings by regarding a matching as a sorted sequence of vertex pairs. Then, our algorithm simply outputs the lexicographically smallest matching. Note that this can be done in polynomial time using Edmonds’ algorithm [Edm65].

###### Theorem 4.1.

Let be the algorithm that outputs the lexicographically smallest maximum matching. Then, the average sensitivity of is at most , where is the maximum size of a matching.

###### Proof.

For a graph , let be its lexicographically smallest maximum matching. As long as , we have . Hence, the average sensitivity of the algorithm is at most

 OPTm⋅OPT+(1−OPTm)⋅0=OPT2m.\qed
###### Remark 4.2.

Consider the path graph , where . The average sensitivity of the above algorithm on is . Hence the above analysis of the average sensitivity is tight.

### 4.2 Greedy matching algorithm

In this section, we describe an algorithm (based on a randomized greedy maximal matching algorithm) with average sensitivity and approximation ration for the maximum matching problem and prove Theorem 4.9.

In Theorem 4.3, we prove that the basic randomized greedy maximal matching algorithm has sensitivity , where is the maximum degree of the input graph.

Theorem 4.4 shows how to transform the randomized greedy algorithm to another algorithm whose average sensitivity does not depend on the maximum degree, albeit at the cost of slightly worsening the approximation guarantee. In particular, Theorem 4.8 shows that a -approximation algorithm for maximum matching with average sensitivity is obtained by applying Theorem 4.4 to the randomized greedy maximal matching.

Finally, we combine the matching algorithm guaranteed by Theorem 4.8 with the matching algorithm guaranteed by Theorem 4.1 using the parallel composition property (Theorem 7.2) of averagely stable algorithms and obtain Theorem 4.9.

#### 4.2.1 Average sensitivity of the greedy algorithm in terms of the maximum degree

In this section, we describe the average sensitivity guarantee of the randomized greedy algorithm described in Algorithm 2.

It is evident that Algorithm 2 runs in polynomial time and that the matching it outputs has size at least the size of a maximum matching in the input graph.

###### Theorem 4.3.

For every undirected unweighted graph , the average sensitivity of Algorithm 2 is , where is the maximum degree of .

###### Proof.

Consider a graph . Let be the maximum degree of . For an edge , let denote the graph obtained by removing from . Let denote the matching output by Algorithm 2 on input . Yoshida et al. [YYI12, Theorem 2.1] show that the presence of a uniformly random edge in depends on at most edges in expectation, where the expectation is taken over both the randomness of the algorithm and the randomness in selecting the edge . By applying Theorem 1.9 to this statement, we can see that the average sensitivity of Algorithm 2 is , where is the maximum degree of . ∎

#### 4.2.2 Averagely stable thresholding transformation

In this section, we show a transformation from matching algorithms whose average sensitivity is a function of the maximum degree to matching algorithms whose average sensitivity does not depend on the maximum degree. This is done by adding to the algorithm, a preprocessing step that removes vertices from the input graph, where the removed vertices have degree at least an appropriate random threshold. Such a transformation helps us to design averagely stable algorithms for graphs with unbounded degree. Let denote the Laplace distribution with a location parameter and a scale parameter .

###### Theorem 4.4.

Let be a randomized algorithm for the maximum matching problem such that the size of the matching output by on a graph is always at least for some . In addition, assume that there exists an oracle satisfying the following:

• when given access to a graph and query , the oracle generates a random string and outputs whether is contained in the matching output by on with as its random string, and

• the oracle makes at most queries to in expectation, where is the maximum degree of and the expectation is taken over the random coins of and a uniformly random query .

Let and be a non-negative function on graphs. Then, there exists an algorithm for the maximum matching problem with average sensitivity

 β(G)≤O(KGδ(τ(G)−KG)+exp(−1δ))⋅OPT+EL[(2L−2)2q(L)],

where is a random variable distributed as and . Moreover, the expected size of the matching output by is at least

 a⋅OPT−am(1−δln(OPT/2))⋅τ(G)−a.

The following fact will be useful in the proof of Theorem 4.4.

###### Proposition 4.5.

Let be a random variable distributed as . Then,