# Vertex Fault-Tolerant Emulators

A k-spanner of a graph G is a sparse subgraph that preserves its shortest path distances up to a multiplicative stretch factor of k, and a k-emulator is similar but not required to be a subgraph of G. A classic theorem by Thorup and Zwick [JACM '05] shows that, despite the extra flexibility available to emulators, the size/stretch tradeoffs for spanners and emulators are equivalent. Our main result is that this equivalence in tradeoffs no longer holds in the commonly-studied setting of graphs with vertex failures. That is: we introduce a natural definition of vertex fault-tolerant emulators, and then we show a three-way tradeoff between size, stretch, and fault-tolerance for these emulators that polynomially surpasses the tradeoff known to be optimal for spanners. We complement our emulator upper bound with a lower bound construction that is essentially tight (within log n factors of the upper bound) when the stretch is 2k-1 and k is either a fixed odd integer or 2. We also show constructions of fault-tolerant emulators with additive error, demonstrating that these also enjoy significantly improved tradeoffs over those available for fault-tolerant additive spanners.

## Authors

• 14 publications
• 15 publications
• 8 publications
02/22/2021

### Partially Optimal Edge Fault-Tolerant Spanners

Recent work has established that, for every positive integer k, every n-...
12/14/2018

### A Trivial Yet Optimal Solution to Vertex Fault Tolerant Spanners

We give a short and easy upper bound on the worst-case size of fault tol...
04/27/2020

### New Extremal bounds for Reachability and Strong-Connectivity Preservers under failures

In this paper, we consider the question of computing sparse subgraphs fo...
03/14/2019

### Fault Tolerant Network Constructors

In this work we examine what graphs (networks) can be stably and distrib...
07/10/2019

### Vertex-Fault Tolerant Complete Matching in Bipartite graphs: the Biregular Case

Given a family H of graphs and a positive integer k, a graph G is called...
10/04/2020

### Distributed Constructions of Dual-Failure Fault-Tolerant Distance Preservers

Fault tolerant distance preservers (spanners) are sparse subgraphs that ...
02/25/2020

### Efficient and Simple Algorithms for Fault Tolerant Spanners

It was recently shown that a version of the greedy algorithm gives a con...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Two well-studied objects in graph sparsification are spanners and emulators. Given a weighted input graph , a -spanner of is a subgraph of in which

 distG(u,v)≤distH(u,v)≤t⋅distH(u,v) (1)

for all . Note that the first inequality, that , is implied automatically by the fact that is a subgraph of . The value is called the the stretch of the spanner.

A -emulator [19] is defined in the same way, except that is not required to be a subgraph of . For emulators, the first inequality is not automatic, and it implies that any edge in the emulator but not in the input graph must have weight at least . In fact, it is easy to see that without loss of generality that we may assign it weight exactly .

### 1.1 Equivalence of Spanner/Emulator Tradeoffs

Both spanners and emulators have been studied extensively, and we have long had a complete understanding of the tradeoffs between spanner/emulator size (number of edges) and stretch. Specifically:

• Althöfer et al. [3] proved that for every positive integer , every weighted graph has a -spanner with at most edges.

• On the lower bounds side, one can quickly verify that any unweighted input graph of girth has no spanner, except for itself. Under the Erdős girth conjecture [20], there are graphs of girth with edges. Thus, the upper bound of Althöfer et al. cannot be improved at all on these graphs.

• Thorup and Zwick [27] observed that essentially the same lower bound applies for emulators. For any two subgraphs of a graph of girth , they disagree on some pairwise distance by more than . This implies that need different representations as -emulators. There are subgraphs of a girth conjecture graph, and so by a pigeonhole argument, one of these subgraphs requires an emulator on edges. (In fact, the same method gives a lower bound on the size of any data structure that approximately encodes graph distances, and hence this is often called an incompressibility argument.)

Thus, even though emulators are substantially more general objects than spanners, they do not enjoy a meaningfully better tradeoff between size and stretch.

### 1.2 Fault-Tolerant Spanners

Spanners are commonly applied as a primitive in distributed computing, in which network nodes or edges are prone to sporadic failures. This has motivated significant interest in fault-tolerant spanners. Intuitively, a vertex fault-tolerant spanner is a subgraph that remains a spanner even after a small set of nodes fails in both the spanner and the original graph. More formally, the following definition was given by Chechik, Langberg, Peleg, and Roditty [16].

###### Definition 1 (VFT Spanner).

Let be a weighted graph. A subgraph of is an -vertex fault-tolerant (-VFT) -spanner of if, for all with , is a -spanner of .

After significant work following [16], we now completely understand the achievable bounds on fault-tolerant spanners: Bodwin and Patel [14] proved that every graph has an -VFT spanner with at most edges (and the same bounds were shown to be achievable in polynomial time by [18, 11]), and Bodwin, Dinitz, Parter, and Williams [10] gave examples (under the girth conjecture) of graphs on which this bound cannot be improved in any range of parameters.

### 1.3 Fault-Tolerant Emulators

In this paper we ask a natural question: what if we add a fault-tolerance requirement to emulators? Are stronger bounds possible than the ones known for spanners? Making progress on this requires answers to two related questions:

1. How should we even define a fault-tolerant emulator? As we discuss shortly, there are two different definitions that both seem plausible at first glance.

2. The lower bound on VFT spanners of [10] can also be generalized into an incompressibility argument, like the one by Thorup and Zwick [27]. Since an emulator is just a different way of compressing distances, why wouldn’t this lower bound apply to fault-tolerant emulators, ruling out hope for a better size/stretch tradeoff?

These questions turn out to have some surprising answers. We first argue that, of the two a priori reasonable definitions of fault-tolerant emulators, only one of them is actually sensible. We then show that this definition escapes the incompressibility lower bound, and we design fault-tolerant emulators that are sparser than the known lower bounds for fault-tolerant spanners by factors. We also discuss fault-tolerant emulators with additive stretch, and show that these also enjoy substantial improvements in size/stretch tradeoff over fault-tolerant additive spanners.

#### 1.3.1 VFT Emulator Definition

Before we can even discuss bounds or constructions, we need to define fault-tolerant emulators. Following Definition 1, we get the following definition:

###### Definition 2 (VFT Emulator Template Definition).

Let be a weighted graph. A graph is an -vertex fault-tolerant (-VFT) -emulator of if, for all with , is a -emulator of .

However, there are two reasonable definitions of (non-faulty) -emulators that we could plug into this template definition. These definitions are functionally equivalent in the non-faulty setting, but they give rise to two importantly different definitions of VFT emulators.

1. One natural possibility is to define a weighted graph to be a -emulator of if it satisfies

 distG(u,v)≤distH(u,v)≤t⋅distH(u,v)

for all nodes . Plugging this into Definition 2, we get that a weighted graph is an -VFT emulator of if, for any fault set and vertices , we have

 distG∖F(u,v)≤distH∖F(u,v)≤t⋅distG∖F(u,v).
2. Recall that in the non-faulty setting, we always set the weight of an emulator edge to be exactly : we need in order to ensure that , and there is no benefit to setting . In other words, we can define an emulator as an unweighted graph, where the weight of each edge simply becomes the corresponding distance in the input graph. We then say that is a -emulator if it satisfies the usual distance inequalities

 distG(u,v)≤distH(u,v)≤t⋅distG(u,v)

after this reweighting. This is a subtle distinction, since there is no important difference from the previous one in the non-faulty setting. But passed through Definition 2, it gives an importantly different definition of VFT emulators:

###### Definition 3 (VFT Emulators).

Let be a weighted graph, and let be an graph on vertex set . For every fault set with , for every with , we define weight function where .

We then define to be the shortest path distance in under weight function . We say that is an -VFT -emulator if

 distG∖F(u,v)≤distH∖F(u,v)≤t⋅distG∖F(u,v)

for all and for all with and .

In other words, for emulator edges in , the edge weight in the post-fault graph automatically updates to be equal to the shortest-path distance between the endpoints in the remaining graph .

Our next task is to point out that the second definition is the natural one to study, both mathematically and because it captures applications of fault-tolerant emulators in distributed systems. Going forward, Definition 3 is the one we use.

#### 1.3.2 Theoretical Motivation for the Second Definition

Although the first definition of VFT emulators may seem simpler, there is a pitfall when one attempts any construction under this definition. Imagine that we add an edge to an emulator , where is not also an edge in . Suppose we set its weight to . Then after any set of vertex faults that stretches at all, the distance will be smaller in than in , violating the lower distance inequality! In general, one would always have to set emulator edge weights to be at least the maximum distance over all possible vertex fault sets . This is an unnatural constraint, and it precludes most reasonable uses of emulator edges. For example, if is a path with three nodes and we create an emulator edge , then if we will have . Thus we are forced to set emulator edge weight , essentially disallowing this as an emulator edge at all.

The other issue is the incompressibility lower bounds from [10]. The lower bound on VFT spanners from [10] actually holds for all compression schemes: one cannot generally build a data structure on bits that can report -approximate distances between all pairs of vertices under at most vertex faults. The first definition of VFT emulators functions as such a compression scheme, so it cannot achieve improved bounds.

Why can we hope for the second definition of VFT emulators to bypass this lower bound? The answer lies in the fact that our emulator definition updates its edge weights under faults. A VFT emulator cannot actually be represented by a data structure of size approximately equal to the number of edges in the emulator, since a static data structure would not have this updating behavior. In other words, since we assume that weight updates occur automatically, we are not charging ourselves for the extra information one would have to carrry around in order to actually compute these updates. This means it is a priori possible that the second definition of fault-tolerant emulators can be significantly sparser than fault-tolerant spanners.

#### 1.3.3 Practical Motivation for the Second Definition.

Now we explain the practical motivation behind the second definition. Automatically updating edge weights may seem at first like an incredibly strong and unrealistic assumption. Indeed, in some of the contexts in which spanners and emulators are used this would not be reasonable, e.g., as a preprocessing step for computing shortest paths [2, 19]. But spanners were originally designed for use in distributed computing [24, 23], and in distributed contexts, emulator edges typically represent logical links rather than physical links. That is, each emulator edge is treated as if it represents a path between the endpoints, since that is how packets/messages would actually travel between the endpoints. An example of this is overlay networks, where one builds a logical network that lives “on top of” another network (usually the Internet). Overlay networks have been extensively studied, often either directly or indirectly using spanners, emulators, or related objects [6, 5, 4, 25, 26].

In a logical link on top of an underlying network, packets are automatically rerouted post-failures using some routing protocol on the underlying topology. The vast majority of these routing protocols use shortest paths. So for a logical link , we would actually expect its distance to “automatically” become , where the seeming “magic” of the edge weight update is implemented by the underlying routing algorithm converging on new shortest paths.

So in applications of emulators to distributed computing, edges that take on weight equal to the remaining shortest path length is a very reasonable assumption. Note that this does not obviate the need for emulators: in an overlay network there will be a layer of routing in the overlay network itself (on top of the underlying network), so packets sent from to will follow shortest paths between and in the overlay. Thus, these packets will experience stretch according to the stretch of the weight-updating emulator.

### 1.4 Our Results

Our previous discussion explains why it is possible for VFT emulators to improve on the size/stretch tradeoff available to VFT spanners. Our main results confirm this possibility; we construct VFT emulators that polynomially surpass the lower bounds for VFT spanners.

#### 1.4.1 Multiplicative Stretch

Our most general results (and main technical contributions) are in the multiplicative stretch setting, where we prove the following upper bound.

###### Theorem 1.1.

For all and , every -node weighted graph admits an -VFT -emulator with

 |E(H)|≤⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩˜Ok(f12−12kn1+1/k+fn)if k odd˜Ok(f12n1+1/k+fn)if k even.

Moreover, there is a randomized polynomial-time algorithm which constructs such an emulator with high probability.

In the above theorem, hides factors that are polylogarithmic in , and also factors that are exponential in .111When is super-constant, [14, 18, 11] already give an upper bound of for VFT spanners, which cannot be improved beyond even with emulators. Thus the most interesting remaining parameter regime is when . We typically think of as being polynomial in , and in this setting (and when is a constant at least ), our emulators improve polynomially on VFT spanners.

The algorithm we design to prove Theorem 1.1 starts from the basic greedy VFT spanner algorithm of [10] (and its polytime extension in [18]), where we consider edges in nondecreasing weight order and add an edge if there is a fault set that forces us to add it. To take advantage of the power of emulators, though, we augment this with an extra “path sampling” step: intuitively, when we decide to add a spanner edge, we also flip a biased coin for every -path that it completes to decide whether to also add an emulator edge between the endpoints of the path. These extra emulator edges do not replace the added spanner edge (i.e., we do not add the emulator edge instead of the spanner edge), but instead act to help protect future graph edges in the ordering, making it less likely that we will need to add spanner edges downstream. Our two main lemmas are roughly that 1) with high probability there are not too many -paths in our final emulator, and 2) if the emulator has many edges then it has many -paths. Combining these with an appropriate parameter balance gives us Theorem 1.1.

The technical details of this analysis get surprisingly tricky, and it turns out that we actually cannot consider all -paths in the algorithm and analysis outlined above, but only a carefully selected subset of them that we call “SALAD” paths. The technical details of these paths are responsible for the factors and the even/odd distinction (a similar even/odd distinction appears in [12], for a similar technical reason).

We complement our emulator upper bound with a nearly matching lower bound, which is a relatively straightforward extension of the edge-fault-tolerant lower bound for spanners from [10]. The case of (stretch ) is slightly different, so we handle it separately.

###### Theorem 1.2.

For all positive integers with , there exists an unweighted -node graph with edges for which any -VFT -emulator must have at least edges.

###### Theorem 1.3.

Assuming the Erdős girth conjecture, for all and there is an unweighted -node graph in which every -VFT -emulator has at least edges.

This lower bound matches our upper bound for constant odd , and is off by only an factor for constant even . An easy folklore observation implies that any -regular input graph requires edges for an -VFT emulator, so our terms cannot be removed either.

Spanners and emulators are also studied in the context of additive stretch: a -spanner/emulator of an input graph is one that satisfies the distance inequality

 distG(u,v)≤distH(u,v)≤distG(u,v)+k

for all nodes . We have a complete understanding of the possibilities for additive emulators. It is known that every unweighted input graph has a -emulator on edges [2] and a -emulator on edges [19]. These emulators are optimal, both in the sense that neither size bound can be improved at all, and in the sense that no -emulator can achieve edges, even when is an arbitrarily large constant [1]. For spanners, our understanding lags only slightly behind: all graphs have -spanners on edges [2, 21], -spanners on edges [17], and -spanners on edges [8, 21, 29].

Braunschvig, Chechik, Peleg, and Sealfon [15] were the first to introduce fault-tolerance to additive spanners, via the natural extension of Definition 3.

Unfortunately, it turns out that the price of fault-tolerance for additive spanners with fixed error is untenably high. It is proved in [13] that, for any fixed constant , there are graphs on which an -VFT -spanner needs edges. In other words, tolerating one additional fault costs in spanner size, and there is no way to tolerate faults in subquadratic size. Accordingly, constructions of VFT spanners of fixed size have to pay super-constant additive error of type [15, 9, 22, 13].

We define VFT additive emulators with similar weight-updating behavior as in the multiplicative setting, with the same motivation. We then show that these emulators actually avoid the undesirable size/fault-tolerance tradeoff suffered by VFT spanners. We show the following extensions of the and emulators:

###### Theorem 1.4.

For all , every -node unweighted graph admits an -VFT -emulator with edges. There is also a randomized polynomial-time algorithm which computes such an emulator with high probability.

###### Theorem 1.5.

For all , every -node unweighted graph admits an -VFT -emulator with edges. There is also a randomized polynomial-time algorithm which computes such an emulator with high probability.

The main point of these results is that the price of fault-tolerance for additive emulators is a multiplicative factor depending only on , rather than the parameter appearing in the exponent of the dependence on , as it does for VFT additive spanners. Moreover, the -factors we obtain are essentially tight by our previous lower bound. Any -emulator is also a -emulator and hence by Theorem 1.2 must have size at least , and any -emulator is also a -emulator and so by Theorem 1.3 must have size at least .

### 1.5 Outline

We begin by proving Theorem 1.1 in the special case in Section 2. This introduces the main ideas and approach that we use to prove Theorem 1.1 in general, but it also happens to avoid a few technical details that become necessary only when we move to larger (allowing us to replace the complicated SALAD paths with simpler “middle-heavy fault-avoiding” paths). We then prove Theorem 1.1 in its full generality: in Section 3 we design an exponential-time algorithm which proves existence of sparse fault-tolerant emulators for all , and then in Section 4 we show how to use ideas from [18] to make the algorithm polynomial-time without significant loss in emulator sparsity. We then prove our lower bounds (Theorems 1.2 and 1.3) in Section 5, and we conclude with our results on additive spanners in Section 6.

## 2 Warmup: k=3 (Stretch 5)

We will warm up by proving the following special case of Theorem 1.1:

###### Theorem 2.1 (Theorem 1.1, k=3).

For all , every -node weighted graph has an -VFT -emulator with .

Our algorithm for -emulators is given in Algorithm 1. We incrementally build an emulator by starting with an empty graph and adding edges. We designate our added edges as spanner edges (which are always contained in the input graph) and emulator edges (which are not generally in the input graph). We then let be the subgraph of containing only its spanner edges, and let be the subgraph of containing only its emulator edges.

The algorithm is defined with respect to a parameter . Intuitively we can think of as (roughly) the desired average node degree in our final emulator: we will set .

We begin by proving correctness.

###### Lemma 2.2.

The graph returned by Algorithm 1 is an -VFT -emulator of .

###### Proof.

It is easy to see (and essentially standard) that we just need to prove that for each edge and possible fault set : by considering shortest paths in , this suffices to imply that for all with , and hence implies that is an -VFT -emulator of .

So let , and let with . If , then this is trivially true since then . Otherwise, Algorithm 1 did not add to . By the condition of the if statement, this implies that as claimed. ∎

We now move to the more difficult (and interesting) task of proving sparsity. We will assume for convenience that all edge weights in the input graph are unique, so that we may unambiguously refer to the heaviest edge among a set of edges. If not, the following argument still goes through if we break ties between edge weights by the order in which the edges are considered by the algorithm. We need to bound the number of spanner edges and the number of emulator edges in the construction; our strategy is to count the number of instances of a particular structure in called middle-heavy fault-avoiding -paths, and then we will use this counting in two different ways to bound the number of spanner edges in , and then the number of emulator edges in .

### 2.1 Sparsity Analysis

We start with the definitions of the paths that we care about, and then prove some of their properties and begin to count them.

###### Definition 4 (Middle-Heavy 3-Paths).

A -path with node sequence is middle-heavy if its middle edge is its heaviest one; that is, and .

When the edge is added to a middle-heavy path in , we say that is completed by (i.e. after adding exists in ).

For every edge added by the algorithm, there must exist some set with such that (or else the algorithm would not have added ). If multiple such sets exist, choose one arbitrarily as .

###### Definition 5 (Fault-Avoiding Paths).

A path in with heaviest edge is fault-avoiding if .

We first prove an auxiliary counting lemma. Let count the number of middle-heavy fault-avoiding -paths from to in

at a given moment in the algorithm. Whenever we choose to add a spanner edge

, we define the set

 Ψ(u,v):={(s,t)∈V×V ∣ (u,v) completes a middle-% heavy fault-avoiding 3-path from s to t}.

The following lemma gives a certain kind of control on the values that can reach:

###### Lemma 2.3.

With high probability, whenever we add a new spanner edge in our algorithm, we have

 ∑(s,t)∈Ψ(u,v)C(s,t)≤˜O(fd2)

where the values are defined just before is added to .

###### Proof Sketch..

We defer the full proof to Appendix A, since the details are technical and do not provide much additional insight. Intuitively, this lemma is true because the counter value corresponds to the number of different times we flipped a coin to decide whether or not to add as an edge (since is the number of middle-heavy -paths between and , and for each such path we flip such a coin). Since each coin has bias by the definition of the algorithm, if were much larger than then with high probability there would already be an emulator edge where . And if such an edge existed, the path would have stretch at most , and hence we would not have added .

Making this formal requires union bounding over all possible fault sets in the definition of fault-avoiding rather than just considering , which also causes the extra factor of in the lemma statement. This introduces significant extra notation but is a straightforward calculation, so we defer it to Appendix A. ∎

We can now use Lemma 2.3 to bound the number of middle-heavy fault-avoiding 3-paths.

###### Lemma 2.4.

With high probability, there are total middle-heavy fault-avoiding -paths in the final graph .

###### Proof.

For each edge added to the emulator, let us split into two cases by the size of . Notice that, since a middle-heavy fault-avoiding path completed by is uniquely determined by and its endpoints, the size of is the same as the number of middle-heavy fault-avoiding paths completed by .

##### Case 1: |Ψ(u,v)|≤d2.

In this case, the edge completes new middle-heavy fault-avoiding -paths. By a unioning, only middle-heavy fault-avoiding -paths can be completed by edges of this type.

##### Case 2: |Ψ(u,v)|>d2.

Assuming the high-probability event from Lemma 2.3 holds, we also have

 ∑(s,t)∈Ψ(u,v)C(s,t)=˜O(fd2).

Thus, the average value of over the node pairs in is . So by Markov’s inequality, for at least half of the node pairs , we have .

This implies that only middle-heavy fault-avoiding -paths may be completed by edges from this case, by a straightforward amortization argument over all pairs . Whenever a middle-heavy fault-avoiding -path is completed by an edge in this second case, let us say that is dispersed if .

By the previous argument, at least half of all paths completed by edges in this case are dispersed, so it suffices to only count the dispersed paths. Moreover, by definition of every dispersed path from to is among the first middle-heavy -paths from to ; thus, unioning over all choices of there are dispersed paths in total.

Combining the two cases, we get at most middle-heavy fault-avoiding -paths in . ∎

We now show how to use the above bound on middle-heavy fault-avoiding -paths to bound the number of edges in our emulator. We first bound the number of emulator edges (edges which were added by the path sampling and so might not be in ) in terms of the number of spanner edges (edges from ), and then bound the number of spanner edges.

###### Lemma 2.5 (Emulator Edge to Path Counting).

With high probability, we have

 ∣∣H(em)∣∣≤O(∣∣E(H(sp))∣∣)+˜O(fn2d2).
###### Proof.

Let be the bound on the number of middle-heavy fault-avoiding 3-paths in which holds with high probability from Lemma 2.4. Consider the following two events.

• Let be the event that has at most middle-heavy fault-avoiding paths. We know from Lemma 2.4 that this holds with high probability.

• Whenever the algorithm considers adding some emulator edge, we call this an attempt. Let

be an indicator random variable for the event that the

attempt is successful (meaning that the emulator edge is actually added). If there is no attempt since the algorithm has terminated before attempts are made, then we set with probability and with probability . Note that for all . Moreover, note that and are independent for . Let , and let be the event that . So if occurs, then of the first attempts, at most emulator edges are added. By linearity of expectations we know that . Moreover, we know that the ’s are independent and that . Hence a standard Chernoff bound implies that occurs with high probability.

Since both and occur with high probability, a simple union bound implies that occurs with high probability. Note that every emulator edge is caused by some attempt, and that the number of attempts is precisely equal to the number of middle-heavy fault-avoiding 3-paths. Hence if both and occur, the number of emulator edges in is at most , as claimed. ∎

###### Lemma 2.6 (Spanner Edge to Path Counting).

Letting be the number of middle-heavy fault-avoiding -paths in , we have .

###### Proof.

Let be a sufficiently small absolute constant and let be the average degree in . If then we have edges in , and we are done. So assume in the following that . We will pass from to a subgraph , and then to another subgraph . The first of these moves is simple: let be a random induced subgraph of obtained by independently keeping each node with probability . For the second move, let us say that an edge in is clean if none of the nodes in its associated fault set survive in . We define as the subgraph of that contains only its clean edges.

Let be the number of middle-heavy -paths in that are simple (do not repeat nodes). Our proof strategy is to bound the expected value of from both below and above.

##### Lower Bound on E[m′′].

First, let us analyze the probability that a given edge in survives in . The probability that survives in is exactly (the event that each survive). Conditional on surviving in , it is clean iff none of the nodes in also survive. Since , is clean with constant probability. So survives in with probability , which implies

 E[∣∣E(H′′)∣∣]=∣∣E(H(sp))∣∣⋅Ω((cδ)−2)=Ω(nδ−1c−2).

Meanwhile, the expected number of nodes that survive in is exactly . Let us imagine that we start with an initially-empty graph on the vertex set , and we add the edges in one by one in order of increasing weight. For each added edge that is the first edge incident to one of its endpoints or , this edge does not create any new middle-heavy -paths. There are at most such edges. Any other edge creates at least one simple middle-heavy -path in . Specifically, the -path in which it is the middle edge must be middle-heavy by the order in which we are adding the edges, and it is simple since if then we are forced to include , but then must not survive in (since is clean). It follows that

 E[m′′] ≥E[∣∣E(H′′)∣∣−∣∣V(H′′)∣∣]=E[∣∣E(H′′)∣∣]−E[∣∣V(H′′)∣∣]=Ω(nδ−1c−2)−nδ−1c−1 =Ω(nδ−1c−2) by choice of small enough c>0.
##### Upper Bound on E[m′′].

We can relate and as follows. We notice that every simple middle-heavy -path in must correspond to a fault-avoiding -path in . This holds because if the middle edge of survives in , then it must be clean, implying that no nodes in survive in .

Now let be a middle-heavy fault-avoiding -path in . We notice that must be simple, since (as before) if then we would have to include and so would not be fault-avoiding. Since is simple it survives in with probability exactly , and thus it survives in with probability . We therefore have

##### Putting It Together.

By the previous two parts, we have , which implies that , and thus Since is defined as the average degree in , this proves the lemma. ∎

Our size bound now follows by directly combining our previous three lemmas; see Appendix A.

###### Lemma 2.7.

The emulator returned by Algorithm 1 has with high probability.

## 3 Vertex Fault-Tolerant (2k−1)-Emulators

Our goal in this section is to prove Theorem 1.1. We start by defining several properties of certain desired paths that let us generalize the algorithm.

### 3.1 SALAD Paths and Proof Overview

We begin by explaining, at a high level, the relationship between this argument for general and the one given previously for . The core of our previous proof was a counting argument over middle-heavy fault-avoiding -paths in . The core of our general argument will be a counting argument over “SALAD” -paths in . SALAD is an acronym for Simple, Alternating, Local, Avoids faults, Dispersed. We will explain these five properties and their role in the analysis momentarily, but first let us state our algorithm. This algorithm uses a notion of local paths, which we define immediately after the algorithm and do not have an analog in our simpler case. We say that a path in is completed by an edge if the path exists in and is the heaviest edge in the path (i.e., the path exists in once has been added).

Stretch analysis of this algorithm is essentially the same as Lemma 2.2; we include it here for completeness.

###### Lemma 3.1.

The emulator returned by Algorithm 2 is an -VFT -emulator.

###### Proof.

Consider some and with . If then clearly . Otherwise, Algorithm 2 did not add to , and so by the “if” condition we know that as required. ∎

• Simple: does not repeat nodes. We implicitly required simplicity in our previous proof, since (as used in Lemma 2.6) a non-simple middle-heavy path of the form is not fault-avoiding. In our extension, it is more convenient to make the simplicity requirement explicit.

• Alternating: is alternating if every even-numbered edge in is heavier than the two adjacent odd-numbered edges. That is: if has edge sequence , then for all , we have and . If is even, then only needs to be heavier than .

“Alternating” turns out to be the natural extension of “middle-heavy” to paths of length (notice for , alternating and middle-heavy are the same). Roughly, our analysis will involve “splitting” paths over their heaviest edge and recursively analyzing the subpath on either side. Like for , this splitting process is most efficient when the heaviest edge is somewhere in the middle of the path (neither the first nor last one). An alternating path is one where the heaviest edge remains somewhere in the middle at every step of the recursion, until finally the path decomposes into individual edges. In fact, this is not quite true in the case where is even (due to the last edge), which is precisely why our bounds are a little worse for even .

• Local: this is a new property that does not have an analog in our previous argument. Let be a parameter (it will be more convenient to specify the implicit constant later in the analysis). For each node , we put the edges incident to in into buckets : the first edges incident to are in its first bucket , the next edges are in the second bucket , etc. We define to be local if, for any three-node contiguous subpath , the edges belong to the same bucket for .

Locality is necessary because we sample SALAD paths of all lengths . Our proof strategy from works just fine to limit the emulator edges contributed by SALAD paths of length . But it does not help us limit the emulator edges contributed by SALAD paths of shorter length . By including locality explicitly, we gain an easy way to limit this quantity, at the price of a little more complexity in some of the downstream proofs.

• Avoids Faults: This is a slightly more stringent property than “fault-avoiding” used previously. Whenever we add a spanner edge , let be a choice of fault set that forces , just like in the proof. We say that avoids faults if, for every edge (not just the heaviest one), we have .

• Dispersed: this property showed up briefly in the case, but we were able to bury it in the technical details of Lemma 2.4. Here, we need to bring it more to the forefront of the analysis. We will say that is a SALA path if it satisfies the first four properties described previously. Among the SALA paths, we will declare them either concentrated or dispersed as follows, and we will only use the dispersed ones in our analysis:

• Notice that we can split into two (possibly empty) shorter SALA paths by removing its heaviest edge . If either of is concentrated, then is concentrated as well. If are both dispersed, then we will say that is splittable, and it may still be concentrated or dispersed according to the following point:

• Set a threshold parameter . For all , among the splittable -paths between each pair of endpoints , the first completed paths are dispersed and the rest are concentrated. If two splittable paths are completed by the same edge, and thus arise in at the same time, then we pick an arbitrary order so that the “first” paths are unambiguous.

The inclusion of locality among our properties actually significantly changes the structure of the proof. Because we consider a more restricted kind of path, it gets much easier to control the number of emulator edges (this is the whole point of locality):

###### Lemma 3.2.

With high probability, .

###### Proof.

One generates a local -path by picking an oriented edge to be the starting edge, and then repeatedly extending the path by choosing edge among the at most possible edges satisfying the locality constraint. Hence there are at most local paths in .

Each local -path completed by a spanner edge is independently sampled as an emulator edge with probability . Thus, the expected number of emulator edges contributed by local -paths is

 O(∣∣E(H(sp))∣∣⋅(bd)j−1)≤∣∣E(H(sp))∣∣⋅O(k)k−1.

Since the edges are sampled independently, by a standard Chernoff bound,

 ˜O(∣∣E(H(sp))∣∣)⋅O(k)k−1.

The lemma now follows by unioning over all choices of . ∎

On the other hand, it gets much harder to bound the number of spanner edges in . We use the following main technical lemma:

###### Lemma 3.3 (Counting Lemma).

Let be a large enough absolute constant, suppose has average degree , and also suppose . Then with high probability, has at least SALAD -paths.

Before proving this lemma, we can do some simple algebra to show why it implies a bound on spanner edges:

###### Lemma 3.4.

If we set parameter

 d:=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩max{polylog n⋅f12−12kn1/k,cf}if k oddmax{polylog n⋅f12n1/k,cf}if k even,

with large enough polylogs, then with high probability, we have

 ∣∣E(H(sp))∣∣≤⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩˜Ok(f12−12kn1+1/k+fn)if k odd˜Ok(f12n1+1/k+fn)if k even
###### Proof, assuming Lemma 3.3.

By definition of dispersion, for each node pair , we can have only total SALAD -paths, so there are SALAD -paths in total. Therefore the number of these paths is at most when is odd. Based on definition of , and by a choice of large enough polylog, this means that has strictly less than SALAD -paths.

If is even, there are at most . Similarly by choosing a large enough polylogarithmic factor in the definition of for the even case, we also have that the number of SALAD -paths is strictly less than .

In both cases, by applying the counting lemma in contrapositive, we conclude that the average degree in is . Thus has edges, and by plugging in the claim follows. ∎

And now it is trivial to prove Theorem 1.1.

###### Proof of Theorem 1.1.

The combination of Lemma 3.2 (which bounds the number of edges of ) and Lemma 3.4 (which relates the number of emulator edges added to ) implies the theorem. ∎

So it just remains to prove our counting lemma, which is the main technical part of the proof.

### 3.2 Counting Lemma

Towards proving our counting lemma, our first task is to extend Lemma 2.3 from the case. We will define slightly more expressive variables: let count the number of local -paths at a given moment in the algorithm. We also define sets

 Ψj(u,v):={(s,t)∈V×V ∣ (u,v) completes a SALA % splittable j-path from s to t}.

The following lemma controls the values that can reach:

###### Lemma 3.5.

With high probability, for all spanner edges added to and all , just before is added we have

 ∑(s,t)∈Ψj(u,v)Cj(s,t)=˜O(fdj−1).
###### Proof.

The proof is similar to Lemma 2.3. Intuitively, if is large enough, then with high probability there will already be an emulator edge for some , which would mean that we would not have actually added to . To formalize this, though, we need to analyze even edges that were not added to and take a union bound over all possible fault sets, as in Lemma 2.3.

So we begin as in Lemma 2.3. Let be an edge in the input graph, and let with . Consider the moment in the algorithm where we inspect and decide whether or not to add it to the emulator (note: are arbitrary; we may or may not actually add , and if we do, we do not necessarily have ). We use the following extensions of our previous definitions:

• For a path in that would be completed, if we added to the emulator, we say that is -avoiding if .

• is the set of node pairs such that, if we added to the emulator, it would complete at least one new -avoiding SALA -path from to .

• We say that is mass-avoiding for and if

 ∑(s,t)∈Ψ(u,v,F)Cj(s,t)>cfdj−1logn.

where is some large enough absolute constant.

Note that the lemma statement is equivalent to the claim that, if is added to , then is not mass-avoiding. We have set things up for general because our proof strategy is to take a union bound over all possible choices of , which will thus include .

We say that a mass-avoiding is good for if (immediately prior to being considered by the algorithm) there is some such that is already an emulator edge in . Otherwise, we say that is bad for .

We now prove that with high probability, every mass-avoiding is good for . To see this, consider some mass-avoiding . Every local -path which contributes to