1 Introduction
Spanning trees are critical components of graph algorithms, from depthfirst search trees (DFS) for finding articulation points and bridges [45], computing numbering [13], chain decomposition [42], and coloring signed graphs [18], to breadthfirst search trees (BFS) for finding separators [34], computing sparse certificates of nodeconnectivity [8, 12], approximating diameters [10, 41], and characterizing ATfree graphs [5], and to maximumleaf spanning trees (MLST) for connected dominating sets [36, 43] and connected maximum cuts [26, 21].
In the semistreaming model, the tractability of spanning tree computation, except arbitrary spanning trees [3, 44, 40], is less studied. The semistreaming model [38, 3] is a variation of streaming model frequently used for the computation of graph problems. It allows the edges of an node input graph to be read sequentially in passes using ^{1}^{1}1We write to denote or where is the number of nodes in the input graph. Similarly, denotes or . space. If the list of edges includes deletions, then the model is called the turnstile model; otherwise it is called the insertiononly model. In both models, some graph problems, such as spanning trees [3], connectivity [25], densest subgraph [37], degeneracy [15], cutsparsifier [30], and coloring [4], can be exactly solved or approximated in a single pass, while other graph problems, such as triangle detection and unweighted allpairs shortest paths [7], are known to require passes to compute. For many fundamental graph problems, e.g., standard spanning trees, the tractability in these models is open. BFS computation is known to require passes [17], but only the naive pass algorithm is known. It is unknown whether DFS computation requires more than one passes [14, 31], but the current best algorithm needs passes [31] where is the height of the computed DFS trees, so for dense graphs. The tractability of maximumleaf spanning trees (MLST) is unknown even allowing space, since it is APXcomplete [35, 20].
Due to the lack of efficient streaming algorithms for spanning tree computation, for some graph problems that are traditionally solved using spanning trees, such as finding articulation points and bridges, people had to look for alternative methods when designing streaming algorithms for these problems [16, 14]. The alternative methods, even if they are based on known results in graph theory, may still involve the design of new streaming algorithms. For the problems mentioned above, the alternative methods use newlydesigned sparse connectivity certificates [12, 25] that are easily computable in the semistreaming model, rather than the classical one due to Nagamochi and Ibaraki [39]. Hence establishing the hardness of spanning tree computation helps to explain the need of the alternative methods.
In this paper, we study the tractability of computing standard spanning trees for connected simple undirected graphs, including BFS trees, DFS trees, and MLST. Unless otherwise stated, our upper bounds work in the turnstile model (and hence also in the insertiononly model), and our lower bounds hold for the insertiononly model (and hence also in the turnstile model). The space upper and lower bounds are in bits. Our results are as follows.
MaximumLeaf Spanning Trees:
We show, by constructing an MLST sparsifier (Section 2), that for every constant , MLST can be approximated in a single pass to within a factor of w.h.p.^{2}^{2}2W.h.p. means with probability . (albeit in superpolynomial time for since it is APXcomplete [35, 20] with inapproximability constant [9]) and can be approximated in polynomial time in a single pass to within a factor of w.h.p., where is the supremum constant that MLST cannot be approximated to within using polynomial time and space. In the insertiononly model, these algorithms are deterministic. We also show a complementary hardness result (Section 5) that for every , to approximate MLST to within an additive error
, any singlepass randomized streaming algorithm that succeeds with probability at least
requires bits. This hardness result excludes the possibility to have a singlepass semistreaming algorithm to approximate MLST to within an additive error. Our results for MLST shows that intractability in the sequential computation model (i.e., Turing machine) does not imply intractability in the semistreaming model.
Our algorithms rely on a new sparse certificate, the MLST sparsifier, defined as follows. Let be an node edge connected simple undirected graph. Then for any given constant , is an MLST sparsifier if it is a connected spanning subgraph of with and , where denotes the maximum number of leaves (i.e. nodes of degree one) that any spanning tree of can have and is some function independent of . We show that an MLST sparsifier can be constructed efficiently in the semistreaming model.
In the turnstile model, for every constant , there exists a randomized algorithm that can find an MLST sparsifier with probability using a single pass, space, and time, and in the insertiononly model a deterministic algorithm that uses a single pass, space, and time.
Combining Section 1 with any polynomialtime RAM algorithms for MLST that uses space, e.g, [35, 36, 43], we obtain the following result.
In the turnstile model, for every constant , there exists a randomized algorithm that can approximate for any node connected simple undirected graph with probability to within a factor of using a single pass, space, and polynomial time, where is the supremum constant that MLST cannot be approximated to within using polynomial time and space, and in the insertiononly model a deterministic algorithm that uses a single pass, space, and polynomial time.
BFS Trees:
It is known that BFS trees require passes to compute [17], but the naive approach needs passes. We devise a randomized algorithm that reduces the pass complexity to w.h.p., and give a smooth tradeoff between pass complexity and space usage.
In the turnstile model, for each , there exists a randomized algorithm that can compute a BFS tree for any node connected simple undirected graph with probability in passes using space, and in the insertiononly model a deterministic algorithm that uses space.
This gives a polynomial separation between singlesource and allpairs shortest paths for unweighted graphs because any randomized semistreaming algorithm that computes unweighted allpairs shortest paths with probability at least requires passes.
We extend Section 1 and obtain that multiple BFS trees, each starting from a unique source node, can be computed more efficiently in pass complexity in a batch than individually (see Section 3.3). We show that this batched BFS has applications to computing a approximation of diameters for unweighted graphs (Section 3.4) and a approximation of Steiner trees for unweighted graphs (Section 3.3).
DFS Trees:
It is unknown whether DFS trees require more than one passes [14, 31], but the current best algorithm needs passes due to Khan and Mehta [31], where is the height of computed DFS trees. We devise a randomized algorithm that has pass complexity w.h.p., and give a smooth tradeoff between pass complexity and space usage.
In the turnstile model, for each , there exists a randomized algorithm that can compute a DFS tree for any node connected simple undirected graph with probability in passes that uses space, and in the insertiononly model a deterministic algorithm that uses space.
For dense graphs, our algorithms improves upon the current best algorithms for DFS due to Khan and Mehta [31] which needs passes for node edge graphs in the worst case because of the existence of cores, where a core is a maximal connected subgraph in which every node has at least neighboring nodes in the subgraph.
1.1 Technical Overview
MaximumLeaf Spanning Trees:
We construct an MLST sparsifier by a new result that complements Kleitman and West’s lower bounds on the maximum number of leaves for graphs with minimum degree [32]. The lower bounds are: if a connected simple undirected graph has minimum degree for some sufficiently large , then and the leading constant can be larger for . Our complementary result (Section 2), without the restriction on the minimum degree, is: any connected simple undirected graph , except the singleton graph, has
(1) 
where denotes the number of nodes whose degree is two and whose neighbors both have degree two. Equation 1 implies that, if one can find a connected spanning subgraph of so that , then one gets an MLST sparsifier.
Our sparsification technique is general enough to obtain a approximation for MLST in a single pass using space by combining any approximation space RAM algorithm for MLST with our MLST sparsifier. On the other hand, since in linear time one can find an MLST sparsifier of edges, any approximation RAM algorithm for MLST with time complexity can be reduced to if a small sacrifice on approximation ratio is allowed. This reduces the time complexity of RAM algorithms for MLST that need superlinear time on the number of edges, such as the local search approach from for to and the leafy forest approach from to , both due to Lu and Ravi [35, 36].
BFS Trees:
We present a simple deterministic algorithm attaining a smooth tradeoff between pass complexity and space usage. In particular, in the insertiononly semistreaming model, the algorithm finishes in passes. The algorithm is based on an observation that the sum of degrees of nodes in any roottoleaf path of a BFS tree is bounded by (Section 3.1).
Our more efficient randomized algorithm (Section 1) constructs a BFS tree by combining the results of multiple instances of boundedradius BFS. To reduce the space usage, the simulation of these boundedradius BFS are assigned random starting times, and the algorithm only maintains the last three layers of each BFS tree. These ideas are borrowed from results on shortest paths computation in the parallel and the distributed settings [11, 22, 27, 46].
DFS Trees:
We present a simple alternative proof of the result of Khan and Mehta [31] that a DFS tree can be constructed in passes using space, for any given parameter , where is the height of the computed DFS tree. The new proof is based on the following connection between the DFS computation and the sparse certificates for nodeconnectivity. We show in Lemma 4.1 that the first layers of any DFS tree of a such a certificate can be extended to a DFS tree of the original graph .
The proof of Theorem 1 is based on the parallel DFS algorithm of Aggarwal and Anderson [2]. In this paper we provide an efficient implementation of their algorithm in the streaming model, also via the sparse certificates for nodeconnectivity, which allows us to reduce the number of passes by batch processing.
We note that in a related work, Ghaffari and Parter [23] showed that the parallel DFS algorithm of Aggarwal and Anderson can be adapted to distributed setting. Specifically, they showed that DFS can be computed in the CONGEST model in rounds, where is the diameter of the graph.
1.2 Paper Organization
In Section 2, we present how to construct an MLST sparsifier and apply it to devise singlepass semistreaming algorithms to approximate MLST to within a factor of for every constant . Then, in Section 3, we show how to compute a BFS tree rooted at a given node by an pass space algorithm w.h.p. and its applications to computing approximate diameters and approximate Steiner trees. In Section 4, we have a similar result for computing DFS trees; that is, pass space algorithm that succeeds w.h.p. Lastly, we prove the claimed singlepass lower bound in Section 5.
2 MaximumLeaf Spanning Trees
In this section, we will show how to construct an MLST sparsifier in the semistreaming model; that is, proving Section 1. We recall the notions defined in Section 1 before proceeding to the results. By ignorable node, we denote a node whose degree is two and whose neighbors and have degree two as well. Note that for simple graphs. Let be the maximum number of leaves (i.e. nodes of degree one) that a spanning tree of can have. Let denote the number of ignorable nodes in . Let denote the degree of node in graph . Let denote any subgraph of so that contains all nodes in and every node in has degree . Let be any spanning tree of a connected graph .
We begin with a result that complements Kleitman and West’s lower bounds on the number of leaves for graphs with minimum degree for any . Our lower bound does not rely on the degree constraint. The constant in Lemma 2 may be improved, but the subsequent lemmata and theorems only require it to be .
Every connected simple undirected graph , except the singleton graph, has
Proof.
Our proof is a generalization of the dead leaf argument due to Kleitman and West [32]. Let be a tree rooted at with as leaves for some arbitrary node initially, where denotes the neighbors of , and then grow iteratively by a node expansion order, defined below. By expanding at node , we mean to select a leaf node of and add all of ’s neighbors in , say , and their connecting edges, , to . In this way, every node outside cannot be a neighbor of any nonleaf node in . We say a leaf node in is dead if it has no neighbor in . Let denote the number of nonignorable nodes in that joins while the th operation is applied. Let denote the change of the number of leaf nodes in while the th operation is applied. Let denote the change of the number of dead leaf nodes in while the th operation is applied. The subscript may be removed when the context is clear. We need to secure that holds for each of the following operations and the initial operation.
 Operation 1:

If has a leaf node that has neighbors outside , then expand at . In this case, , , and .
 Operation 2:

If every leaf node in has at most one neighbor outside and some node has neighbors in , then expand at one of ’s neighbors in . In this case, , , and .
 Operation 3:

This operation is used only when the previous two operations do not apply. Let be some leaf in that has exactly one neighbor not in . For each , if is defined and all neighbors of other than are outside and has degree two in , then define to be the neighbor of other than . Suppose that for are defined and is not defined, then we expand at for each in order. Though can be arbitrarily large, . If is not defined and has neighbors other than in (thus in this case otherwise Operation 2 applies), then we discuss in subcases:
 Subcase 1 ():

It is impossible to have for this case.
 Subcase 2 ():

Then and .
 Subcase 3 ():

Then and .
If is not defined and has 0 neighbor other than in , then is either 1 or . For , and . For , and .
It is clear that one can expand to get a spanning tree of by a sequence of the above operations. Because all leaves are eventually dead, . Consequently, , as desired. ∎
Given Section 2, our goal is, for every constant , find a sparse subgraph of the input graph so that:

The nodes incident to the edges in can be dominated by a small set of at most nodes, i.e. either in or has at least one neighbor node in using the edges in , where is any optimal MLST of .

is connected.
Because of the existence of the small dominating set , one can obtain a forest from by adding some edges in so that the number of leaves in is no less than that in by and the number of connected components in is no more than that in by . Since is connected, one can further obtain a spanning tree from by adding at most edges in , so the number of leaves in is no less than that in by . Pick an associated with a sufficiently small , by Equation 1 is an MLST sparsifier. A formal proof is given below.
For every integer , every connected simple undirected graph has
Proof.
Let be a spanning tree of that has leaves. Let be some fixed integer at least and let . Let . Note that every node has , so and all neighbors of are not ignorable nodes in .
First, we show that can be dominated by a small set of size at most using some edges in . We obtain from two parts, and . is a random node subset sampled from the nonignorable nodes in , in which each node is included in with probability independently, for some to be determined later. Thus, . Since every node is adjacent only to the nonignorable nodes in , the probability that is not dominated by any node in is
Let be the set of nodes in that are not dominated by any node in using the edges in . Thus,
Then, we obtain a forest from by adding some edges in as follows. Initially, .
 Operation 1:

For each , if is an isolated node in and , then add an edge from to some node in to . Such an edge must exist because dominates .
 Operation 2:

For each , if is not an isolated node in and the connected component that contains has an empty intersection with , then add an edge from to some node in to . Again, such an edge must exist because dominates .
For each leaf , if , then is a leaf in (also in unless ); otherwise , if is not a leaf in , then must be an isolated node in , and by Operation 1 is connected to some node in unless . Hence, except those in , every is a leaf node in , so the number of leaves in is no less than that in by . By Operation 2, the number of connected component is at most .
Lastly, since is connected, one can obtain a spanning tree from by connecting the components in by some edges in . Thus, the number of leaves in is no less than that in by . To obtain an MLST sparsifier, by Section 2 we need:
Setting gives the desired bound, and the leading constant is positive for . ∎
To find such a subgraph , fetching a spanning tree of the input graph and grabbing edges for each node in suffices. Thus, we get a singlepass space algorithm for the insertiononly model. As for the turnstile model, we use samplers [29] for each node in to fetch at least neighbors of w.h.p., and fetch a spanning tree by appealing to the singlepass space algorithm for spanning trees in dynamic streams [3]. This gives a proof of Section 1.
Applications.
In [21], Gandhi et al. show a connection between the maximumleaf spanning trees and connected maximum cut. Their results imply that, for any unweighted regular graph , the connected maximum cut can be found by the following two steps:
 Step 1:

Find a spanning tree whose for some constant .
 Step 2:

Randomly partition the leaves in into two parts and so that each leaf is included in with probability independently.
Then, outputting and yields an approximation for connected maximum cut. Step 1 is the bottleneck and can be implemented by combining our MLST sparsifier (Section 1) with the 2approximation algorithm for MLST due to SolisOba, Bonsma, and Lowski [43]. This gives Section 2.
In the turnstile model, for every constant , there exists a randomized algorithm that can approximate the connected maximum cut for node unweighted regular graphs to within a factor of with probability in a single pass using space.
3 BreadthFirst Search Trees
A BFS tree of an node connected simple undirected graph can be constructed in passes using space by simulating the standard BFS algorithm layer by layer. By storing the entire graph, a BFS tree can be computed in a single pass using space. In Section 3.1, we show that it is possible to have a smooth tradeoff between pass complexity and space usage. In Section 3.2, we prove Section 1, which shows that the above tradeoff can be improved when randomness is allowed, even in the turnstile model. Then, in Section 3.3, we show that multiple BFS trees, each starting from a distinct source node, can be computed more efficiently in a batch than individually. Lastly, we demonstrate an application to diameter approximation in Section 3.4.
In the BFS problem, we are given an node connected simple undirected graph and a distinguished node , and it suffices to compute the distance for each node . To infer a BFS tree from the distance information , it suffices to assign a parent to each node the smallestidentifier node from the set where is the set of ’s neighbors. This can be done with one additional pass using space in the insertiononly model. In the turnstile model, for pass streaming algorithms with , this can be done with additional passes w.h.p. using samplers [29] for each node , and this costs space. For , the space bound is and one can use samplers for each node, so this step can be done in one additional pass. Hence in the subsequent discussion we focus on computing the distance from to each node .
3.1 A Simple Deterministic Algorithm
We present a simple deterministic pass space algorithm in the insertiononly model by an observation that every roottoleaf path in a BFS tree cannot visit too many highdegree nodes (Section 3.1). Then, one can simulate the standard BFS algorithm efficiently layerbylayer over highdegree nodes (Section 3.1).
Let be a roottoleaf path in some BFS tree of an node connected simple undirected graph . Then
where denotes the degree of in .
Proof.
Suppose comprises nodes. Observe that if and have , then and cannot share any neighbor node; otherwise can be shorten, a contradiction. Thus, for each the total contribution of all ’s whose to is . Summing over all possible gives the bound. ∎
We note that Lemma 3.1 is nearoptimal. To see why, let where is the union of disjoint sets and . By setting for some parameter , , for every , and , any BFS tree rooted at the node in has a roottoleaf path of length , and each node in has degree . Pick any such that and . We have .
Given an node connected simple undirected graph with a distinguished node , a BFS tree rooted at can be found deterministically in passes using space for every in the insertiononly model.
Proof.
Given a parameter , our algorithm goes as follows. In the first pass, keep arbitrary neighbors for each node in memory and then use the inmemory edges to update the distance for each by any singlesource shortest path algorithm. The set of the inmemory edges is an invariant after the first pass. Hence, the memory usage is . Then, in each of the subsequent passes, processing the edges in the stream one by one, without keeping them in memory after the processing, if (resp. if ), then update (resp. ). After the edges in the stream are all processed, use the inmemory edges to update the distance for each again by any singlesource shortest path algorithm but with initial distances. Our algorithm repeats until no distance has been updated in a single pass.
Observe a roottoleaf path in some BFS tree rooted at . Suppose contains exactly edges that appears only on tape, let them be where for every . Let be the predecessor of on that is closest to among nodes in . By the definition of the above construction, it is assured that for each . Thus by Section 3.1, . Then we appeal to the argument used for the analysis of BellmanFord algorithm [19, 6]. For every , if , attains the minimum possible value at the same pass when attains; otherwise for some , attains the minimum possible value at most one pass after attains. Hence, passes suffices to compute for all and this argument applies to all roottoleaf paths. Setting yields the desired bound. ∎
3.2 A More Efficient Randomized Algorithm
In this section, we prove Theorem 1. Our BFS algorithm is based on the following generic framework, which has been applied to finding shortest paths in the parallel and the distributed settings [11, 22, 27, 46]. Sample a set of approximately distinguished nodes such that each node joins independently with probability , and with probability 1. By a Chernoff bound, with high probability. We will grow a local BFS tree of radius from each node in , and then we will construct the final BFS tree by combining them. We will rely on the following lemma, which first appeared in [46].
[[46]] Let be a specified source node. Let be a subset of nodes such that each node joins with probability , and joins with probability 1. For any given parameter , the following holds with probability . For each node , there is an  shortest path such that each of its node subpath satisfies .
For notational simplicity, in subsequent discussion we write . Lemma 3.2 shows that for each node ,
(2) 
with probability where .
To see this, consider the  shortest path specified in Lemma 3.2. If the number of nodes in is less than , then the above claim holds because . Otherwise, Lemma 3.2 guarantees that there is a node with probability . Using Equation 2, a BFS tree can be computed using the following steps.

Compute for each and . Using this information, we can infer for each .

Compute for each by the formula .
In what follows, we show how to implement the above two steps in the streaming model, using space and passes. By a change of parameter , we obtain Theorem 1.
Step 1.
To compute for each and , we let each initiate a radius local BFS rooted at . A straightforward implementation of this approach in the streaming model costs passes and space, since we need to maintain search trees simultaneously.
We show that the space requirement can be improved to . Since we only need to learn the distances between nodes in , we are allowed to forget distance information associated with nodes when it is no longer needed. Specifically, suppose we start the BFS computation rooted at at the th pass, where is some number to be determined. For each , the induction hypothesis specifies that at the beginning of the th pass, all nodes in have learned that . During the th pass, for each node with , we check if has a neighbor in . If so, then we learn that .
In the above BFS algorithm, if for some , then we learn the fact that during the th pass. Observe that such information is only needed during the next two passes. After the end of the th pass, for each with , we are allowed to forget that . That is, only needs to participate in the BFS computation rooted at during these three passes .
For each , we assign the starting time independently and uniformly at random from . Lemma 3.2 shows that for each node and for each pass , the number of BFS computations that involve is . The idea of using random starting time to schedule multiple algorithms to minimize congestion can be traced back from [33]. Note that is the criterion for to participate in the BFS rooted at during the th pass.
For each node , and for each integer , with high probability, the number of nodes such that is at most .
Proof.
Given two nodes and , and a fixed number , the probability that is at most . Let be the total number of such that . The expected value of can be upper bounded by . By a Chernoff bound, with high probability, . ∎
Recall that with high probability, and . By Lemma 3.2, we only need space per each to do the radius BFS computation from all nodes . That is, the space complexity is . To store the distance information for each and , we need space. Thus, the algorithm for Step 1 costs space. The number of passes is .
In the insertiononly model, the implementation is straightforward. In the turnstile model, care has to be taken when implementing the above algorithm. We write to be the high probability upper bound on the number of BFS computation that a node participates in a single pass. We write . Let be random subsets of such that each joins each with probability , independently. Consider a node and consider the th pass. Let be the subset of such that if , i.e., the BFS computation rooted at hits during the th pass. We know that with high probability . By our choice of , we can infer that with high probability for each there is at least one index such that .
To implement the th pass in the turnstile model, each node virtually maintains edge set . For each insertion (resp., deletion) of an edge satisfying for some , we add (resp., remove) the edge from the set . After processing the entire data stream, we take one edge out of each edge set . In view of the above discussion, it suffices to only consider these edges when we grow the BFS trees. This can be implemented using samplers per each node , and the space complexity is still .
Step 2.
In the insertiononly model, this task can be solved using iterations of BellmanFord steps. Initially, for each , and for each . During the th pass, we do the update . By Equation 2, we can infer that for each . A straightforward implementation of this procedure costs space and passes.
In the turnstile model, we can solve this task by growing a radius BFS tree rooted at , for each , as in Step 1. During the process, each node maintains a variable
which serves as the estimate of
. Initially, . When the partial BFS tree rooted at hits , we update to be the minimum of the current value of and . At the end of the process, we have for each . This costs space and passes in view of the analysis of Step 1.3.3 Extensions
In this section we consider the problem of solving instances of BFS simultaneously for some and a simpler problem of computing the pairwise distance between the given nodes. Both of these problems can be solved via a black box application of Section 1. In this section we show that it is possible to obtain better upper bounds.
Given an node undirected graph , for any given parameters , the pairwise distances between all pairs of nodes in a given set of nodes in can be computed with probability using passes and space in the turnstile model.
Proof.
Let be the input node set of size . Consider the modified Step 1 of our algorithm where each is included in with probability 1. Since , we still have with high probability. Recall that Step 1 of our algorithm calculates for each and in space and passes. Applying Equation 2 for each , we obtain the pairwise distances between all pairs of nodes in , which includes as a subset. There is no need to do Step 2. ∎
For example, if , then Theorem 3.3 implies that we can compute the pairwise distances between all pairs of nodes in a given set of nodes in space and passes.
Given an node undirected graph , for any given parameters , one can solve instances of BFS with probability using passes and space in the turnstile model.
Proof.
Let be the node set of size corresponding to the roots of the BFS instances. Consider the following modifications to our BFS algorithm.
Same as the proof of Theorem 3.3, in Step 1, include each in with probability 1. The modified Step 1 still takes space and passes, and it outputs the pairwise distances between all pairs of nodes in .
Now consider Step 2. In the insertiononly model, remember that a BFS tree rooted at a node can be constructed in space and passes using iterations of BellmanFord steps. The cost of constructing all BFS trees is then space and passes.
In the turnstile model, we can also use the strategy of growing a radius BFS tree rooted at , for each . During the process, each node maintains variables serving as the estimates of , for all . The complexity of growing radius BFS trees is still space and passes. The extra space cost for maintaining these variables is . ∎
For example, if , then Theorem 3.3 implies that we can solve instances of BFS in space and passes. Note that the space complexity of is necessary to output BFS trees.
Section 3.3 immediately gives the following corollary.
Given an node connected undirected graph with unweighted edges and a node subset of , for any given parameters , finding a Steiner tree in that spans can be approximated to within a factor of with probability using passes and space in the turnstile model.
Note that if we do not need to construct a Steiner tree, and only need to approximate the size of an optimal Steiner tree, then Section 3.3 can be used in place of Section 3.3.
3.4 Diameter Approximation
It is wellknown that the maximum distance label in a BFS tree gives a approximation of diameter. We show that it is possible to improve the approximation ratio to nearly without sacrificing the space and pass complexities.
Roditty and Williams [41] showed that a nearly approximation of diameter can be computed with high probability as follows.

Let be a node set chosen by including each node to with probability independently. Perform a BFS from each node .

Let be a node chosen to maximize . Break the tie arbitrarily. Perform a BFS from .

Let be the node set consisting of the nodes closest to , where ties are broken arbitrarily. Perform a BFS from each node .
Let be the maximum distance label ever computed during the BFS computations in the above procedure. Roditty and Williams [41] proved that satisfies that , where is the diameter of .
The algorithm of Roditty and Williams [41] can be implemented in the streaming model by applying Theorem 3.3 with , but we can do better. Note that when we perform BFS from the nodes in and , it is not necessary to store the entire BFS trees. For example, in order to select , we only need to let each node know , which is the maximum distance label of in all BFS trees computed in Step 1. Therefore, the term in the space complexity of Theorem 3.3 can be improved to . That is, the space and pass complexities are the same as the cost for computing a single BFS tree using Section 1. We conclude the following theorem.
Given an node connected undirected graph , a diameter approximation satisfying , where is the diameter of , can be computed with probability in passes using space, for each in the turnstile model.
4 DepthFirst Search
A straightforward implementation of the naive DFS algorithm in the streaming model costs either passes with space or a single pass with space. Khan and Mehta [31] recently showed that it is possible to obtain a smooth tradeoff between the two extremes. Specifically, they designed an algorithm that requires at most passes using space, where is any positive integer. Furthermore, for the case the height of the computed DFS tree is small, they further decrease the number of passes to . In Section 4.1, we will provide a very simple alternative proof of this result, via sparse certificates for nodeconnectivity.
In the worst case, the “space number of passes” of the algorithms of Khan and Mehta [31] is still . In Sections 4.2 and 4.3, we will show that it is possible to improve this upper bound asymptotically when the number of passes is superconstant. Specifically, for any parameters , we obtain the following DFS algorithms.

A deterministic algorithm using passes and space in the insertiononly model. After balancing the parameters, the space complexity is
Comments
There are no comments yet.