In analyzing algorithms, mostly we concentrate on minimizing the running time, or the quality of the solution (if the problem is hard). After we have optimized the above parameters, we then look to reduce the space taken by the algorithm, if possible. An excellent theoretical question is: Given a problem , design an algorithm that solves it in as low space as possible. These algorithms are called space-efficient algorithms as we want to optimize on the space taken by the algorithm while not increasing the running time by much (compared to the best algorithm for the problem with no space restriction).
Recently, designing space-efficient algorithms has gained importance because of the rapid growth in the use of mobile devices and other hand-held devices which come with limited memory (e.g., the devices like Raspberry Pi, which are widely used in IoT applications). Another crucial reason for the increasing importance of the space-efficient algorithms is the rate and the volume at which huge datasets are generated (“big data”). Areas like machine learning, scientific computing, network traffic monitoring, Internet search, signal processing, etc., need to process big data using as less memory as possible.
Algorithmic fields like Dynamic Graph Algorithm [10, 21, 25, 26, 28] and Streaming algorithm [2, 3, 12, 27, 3, 1, 2] mandate low space usage by the algorithm. In a streaming algorithm, the mandate is mentioned upfront. In a dynamic graph algorithm, this mandate is implied as we want the update time of the algorithm to be as low as possible. Low update time implies that we don’t have enough time to look at our data-structure. Thus, we want our data-structure to be as compact as possible. Motivated by the growing body of work in the field of space-efficient algorithms, this paper focuses on optimizing the space taken by the DFS algorithm, which is one of the fundamental graph algorithms.
However, one needs to be slightly cautious about the definition of space. For a graph problem, it would take space just to represent the graph. So, it seems that any graph problem requires bits. To avoid such trivial answers, we first define our model of computation.
1.1 Model of Computation : Register Input Model 
Frederickson  introduced the register input model in which the input (graph – in this case) is given in a read-only memory (thus, it cannot be modified). Also the output of the algorithm is written on a write-only memory. Along with the input and the output memory, a random-access memory of limited size is also available. Similar to the standard RAM model, the data on the input memory and the workspace is divided into words of size bits. Any arithmetic, logical and bitwise operations on constant number of words take time.
When we say that our algorithm uses bits, this is the space on the random-access memory used by our algorithm. The above model takes care of the case when the input itself takes a lot of space — by designating a special read-only memory for the input.
We highlight some results that make use of the register input model. Pagter and Rauhe  described a comparison-based algorithm for sorting numbers: for every given with , an algorithm that takes time using bits. A matching lower bound of for the time-space product was given by Beame  for the strong branching-program model. Please see references for other problems in this model [13, 14, 18, 22, 24, 4, 6, 9, 8, 15]. In this paper, our main focus is on the Depth First Search Problem.
1.2 DFS Problem
The problem of space efficient Dfs has received a lot of attention recently. Asano et al.  designed an algorithm that can perform Dfs in (unspecified) polynomial time using bits. If the space is increased to bits then their running time decreases to . They also showed how to perform Dfs in time using bits. Elmasry et al.  improved this result by designing an algorithm that can perform Dfs in time using bits. Banerjee et al. proposed an efficient Dfs algorithm that takes time using space. Note that this is a strict improvement (over the Elmasry et al.  result) only if the graph is sparse. The following open question was raised by Asano et al.  in their paper:
Using space, can Dfs be done in time?
Recently, Hagerup  claimed an algorithm that finds Dfs in time using bits of space. We improve upon this algorithm giving a near optimal running time for Dfs — it is almost linear in . Our result can be succinctly stated as follows:
There exists a randomized algorithm that can perform Dfs
of a given graph in time with a high probability
time with a high probability ((where )) using bits of space. (Note that our algorithm is randomized because we use succinct dictionaries that use random bits)
The succinct dictionary (used by our algorithm) performs insertion/deletion in time with a probability of (where ). Our algorithm performs at most insertions/deletions across all dictionaries. Hence, the probability that our algorithm takes more than time for any of these insertions/deletions is (by union bound).
We will assume that vertices of input graph are numbered from 1 to . Let denote the neighborhood of the vertex and denote the -th neighbor of the vertex , where . As in , we will assume that is an array. So, we have random access to any element in this array. Also, we implicitly know the degree of , .
Normally, the Dfs algorithm outputs the Dfs tree. Given the space bounds, we cannot store the Dfs tree, but, we output the edges of the Dfs tree as soon as we encounter them. We view that the problem is solved if the output edges form a valid Dfs tree.
We first give a quick overview of the non-recursive implementation of the Dfs algorithm. Let be the input graph having vertices and edges. For this implementation, we will use a stack . Initially, all vertices are colored white and assume that we start the Dfs from a vertex . So, is added to the stack . The algorithm then processes all elements of the stack till it becomes empty. Thus, the top vertex, say , is popped from the stack and is processed as follows: each neighbor of is explored. If a white vertex is found, then is pushed on to the stack and processing of starts. If none of the neighbors of vertex are white, then is colored black. Whenever discovers a white vertex , we push a tuple on to , where the second entry in the tuple tells us which neighbor of vertex to explore once processing of resumes.
Now, let us formally define the second entry in the tuple
For any vertex , if is an entry on the stack , then denotes the first neighbor of the vertex which is still not explored while processing .
The space required to represent the first and second term of each tuple in the stack is bits. As there are vertices in the graph, the size of the stack can reach in the worst case. So, the total space taken by the trivial algorithm is bits.
Our algorithm closely follows . So, we first give a brief overview of their approach and later, we will explain our improvement over their approach.
2.1 Previous Approach (Elmasry et. al. )
The trivial Dfs algorithm does not work for Elmasry et al. because the stack itself takes bits of space. Hence, stack is not implemented — but, is referred to as an imaginary stack. Let the stack be divided into segments of size — the first segment is the bottommost vertices of , the second segment is the next vertices of and so on. A new stack is implemented, which contains vertices from at most top two segments of the imaginary stack . Each entry of the stack is a tuple: where . The space required to represent these two terms is at most . Thus, the total space required for is bits. Since, the size of is very small as compared to the imaginary stack , the main problem arises when an element is to be pushed on but it is full or when becomes empty (but contains vertices). Thus, there is a need to make space in or a way to restore vertices in .
To handle the case when is full, Elmasry et al. remove the bottom half elements of . So, a new entry can now be pushed on to , and the Dfs algorithm can proceed as usual.
Handling the second case (when is empty) requires to restore the top segment of in . It turns out that the restoration process is the main bottleneck of this Dfs algorithm. To aid the restoration process, Elmasry et al. propose an elegant solution by maintaining an additional stack , called a trailer stack. The top-most element of each segment in is called as a trailer element. The stack stores the trailer element of each segment in – except trailers of those segments which are already present in .
The stack is crucially used in the restoration process. Let be the second top most entry in stack . This implies that the first vertex of top segment of is . Now, a Dfs-like algorithm is run starting from the vertex to restore the top segment of in as follows:
Temporarily the meaning of gray and white vertex is changed. Then, process to find as follows: find the first gray neighbor of , mark it white, push (where ), and then start processing of . Elmasry et al.  show that this restoration process correctly restores the top segment of .
Some explanation is in order about the above procedure. Once we have found , we want to find . Analogously, we can say that we want to find . This vertex, , was a white vertex encountered while processing . Due to , we stopped the processing of , put on and start the processing of .
Even though the above algorithm is correct, it is still slow. Finding the first gray neighbor of a vertex takes time. To overcome this difficulty, Elmasry et al. suggest the use of two more data-structures. The first data-structure is an array of size that contains the following information for each vertex : if is an element of , then contains
The segment number in which lies.
The approximate position of in .
Since there are segments of (as each segment is of size , it requires bits to represent the first quantity. Similarly, storing the approximate position also takes bits. Thus the space required for is bits.
The second term in helps to fasten the search process for only if the degree of is sufficiently small. However, to take care of high degree vertices, the trailer stack is extended to include not only trailers but also all the pair , where is a high degree vertex. Finally, Elmasry et al.  show that the extended trailer stack takes bits. Moreover, using and the extended restores correctly and efficiently.
2.2 Our Approach
We give a brief overview of our approach. In , the array plays a critical role in the restoration process. While restoring the top segment, provides the required information for each vertex which is a part of the top-most segment. However, takes bits – a space we cannot afford. Our main observation is that we do not require information related to all vertices while restoring . Indeed, storing information about vertices in the top-most segment suffices. Unfortunately, it is not easy to keep information related to vertices in top-most segment efficiently in space. To overcome this difficulty, along with the stack 111In our algorithm, size of is bit different than that in . It is mentioned in Remark 1 we implement (a dynamic dictionary – as described in Lemma 3) which contains information about top vertices vertices of the imaginary stack . For each vertex in , we store bits of information that will help us when we restore (remember that the size of is much less that the size of ). We can show that the size of is bits. Thus, we have successfully reduced the size of (named in ).
Since does not store the information of all the vertices in stack , it faces the restoration problem as well. If top vertices are popped out of , those are also deleted from . Thus, we need to restore . To aid in the restoration of , we implement another data-structure , which contains the information top vertices of . For each vertex in , we will store bits of information. The size of can be shown to be bits. It is not hard to see that this process goes on recursively and we have many data-structures where the last data-structure is . stores information about vertices, where is some constant. But, the restoration problem does not disappear yet. Now the question is how do we restore ? Beyond this, we do not create any more data-structure. We restore using the most trivial strategy, that is by running Dfs all over again. Our main claim is that throughout our algorithm is restored at most times. We will show that the time taken to restore is . Thus the total time taken to restore is (since is a constant). For other ’s (), our analysis is slightly different and it is the main technical contribution of this paper. We will show that the total time taken to restore over the entire course of the algorithm is where . Thus, the time taken to restore all ’s over the entire course of the algorithm is .
Let us now briefly describe the space taken by our algorithm.
Each stores information about at most top vertices of . Also, for each
such vertex, we will only store bits.
Using succinct dictionary , we will show that we can implement in
space. Thus, the total space taken by our algorithm is bits.
Note that our algorithm will also use some other data-structures which we have not described till now. However, the
major challenge in our work was to bound the size of ’s. All our other data-structures
take bits cumulatively. Thus, the total space taken by our algorithm is bits.
This completes the overview of our algorithm.
In the above description, each contains at most top elements of . Thus, the size of is . This is a crucial difference from the Elmasry et al.  algorithm, where the size of was . The main reason for this change is to decreases the space taken by our algorithm. Indeed, the cumulative space taken by all ’s (in our algorithm) can be shown to be . In spite of this change, the running time of our algorithm does not suffer. To summarize, this is an important technical change from the previous work with the sole aim to decrease the space taken by the algorithm.
In our algorithm, the following data-structure plays a crucial role.
(Succinct Dynamic Dictionary ) Given a universe of size , there exists a dynamic dictionary that stores a subset of size at most . Each element of has a satellite data of size where . The time taken for membership, retrieval, insert, and delete any element (and its satellite data) is with probability for some chosen constant . The space taken by the data-structure is bits.
Note that a similar dictionary was also described in Lemma 2.1 of .
We define few basic notation/data-structures that will be used in the ensuing discussion.
(iterated logarithm) is the number of times the logarithm function is iteratively applied till the result is . Define . Note that
We divide the imaginary stack into segments of size . An () contains vertices of consecutive segments of . We divide the imaginary stack into s from bottom to top (only the topmost may contain less number of consecutive segments). The total number of vertices in an is at most and the total number of s is at most . For brevity, we will drop the ceil notation in the rest of the paper.
A stack will store the vertices present in at most top two segments of . Each cell of contains the tuple of type .
Dynamic Dictionary for
We will store information about vertices of at most top two in a dynamic dictionary . This information will be crucial in restoring .
In , the restoration algorithm uses the trailer stack to find a vertex from which the restoration of should start. In our algorithm, as we have to restore , we require many trailer stacks.
To this end, we implement a trailer stack for each . In the trailer stack (, we keep the bottommost element of the imaginary stack and the top vertex of all s of that are not present in .
4 Our Algorithm
Our algorithm is nearly similar to the Elmasry et al. algorithm. We initially color all the vertices white (the space taken by the Color array is bits as we color a vertex white, gray or black only). Then we take an arbitrary vertex, say , and do a Dfs from . Like Elmasry et al., initially is pushed on to the stack. Additionally, we also insert to all other ’s.
We then go over the stack till it becomes empty. Analogously, we can say that we will process the stack till the trailer becomes empty — as always contains the bottommost element of the imaginary stack . Our While loop is similar to the standard Dfs algorithm with the addition that we push and pop not only to but insert to and delete from all ’s. Let be the top element of . We pop from and also delete it from all other ’s. Then we color gray. We then check if the -th neighbor of , is white or not. If it is white, then we first push back on to the stack (and all other ’s). After that, is pushed to and all the other relevant data-structure. When we have processed all the neighbors of , it is colored black.
We now calculate the running time of our Dfs algorithm in Algorithm 3. In the classical Dfs algorithm, a gray vertex is pushed onto the stack again after it finds a new white vertex. This implies that vertices can be pushed on to the stack at most times. Our Dfs algorithm is nearly similar to the classical Dfs algorithm with the only difference that we insert/delete into “stacks” instead of one. Thus we claim the following running time:
Not accounting for the time taken by Insert and Delete procedures, the time taken by our Dfs algorithm in Algorithm 3 is .
In procedure, we add the information about vertex to . Remember that is used to restore . We will now describe in detail.
5 Information in
In , where we just have to restore , the following two pieces of information about each vertex is stored in : (1) The segment number in which lies. (2) The approximate position in where lies.
We try to generalize this idea. Unlike , the dictionary in our algorithm contains information about vertices present in at most two top s. For each such , let denote the cell in which information related to is stored. We will store the following information related to .
The number in which lies.
Remember that ’s main function is to restore . Thus, for each vertex , we will store the to which belongs, let us denote it by . will help the restore algorithm of to check whether indeed lies in the top . Since the total number of is , bits are required to represent .
The approximate position in where lies.
The above information is used to find efficiently. It would have been nice if we could explicitly store . However, this will require bits for each vertex in — a space which we cannot afford. To overcome the space limitation, we divide into groups of appropriate size and store the group number in which lies.
The exact definition of the second term requires some more work. Note that takes just bits. We want the second term also to take bits. Thus, the number of groups into which we divide should not be huge (it should be ). However, if the number of groups is small, it implies that the group size, i.e., the number of vertices in each group, may be large. Thus, given the group number, finding in the group will take more time. Thus, we are faced with a dilemma where reducing the space increases the running time of our algorithm. To overcome this dilemma, we extend a strategy used in . Elmasry et al.  divided the vertices into two sets – heavy and light. A light vertex has low degree — thus, its group size is small. For heavy vertices, they show that the total number of heavy vertices is small and for each heavy vertex , can be stored explicitly without using too much space. We plan to extend this strategy. But unlike , we have a hierarchy of heavy and light vertices (since we have a hierarchy of ’s).
5.1 Light Vertices
A vertex is if where . We define all the vertices in to be .
We are now ready to define the second information related to stored in . If is , then we divide into groups of size .
If is , then the second information of (approximate position of in ) stored in is defined as follows: if .
The total number of groups of is . Thus the total number of bits required to represent is 3 bits.
Remember that we partitioned the set of vertices into light and heavy only to make the group size small. We now bound the number of vertices in a group of a vertex.
If is , then the total number of vertices in each group of is .
We are now ready to formally define the information about vertex stored in .
If an vertex becomes a part of top of imaginary stack , then we store the following information about .
If vertex is not , then , that is we just store the in which resides.
Some explanation is in order. If is an vertex, then we can store the information corresponding to . We have already shown that both these terms take bits. Moreover, given the group number , we can find in time, as the number of vertices in each group of an vertex is (using Observation 7).
However, if is not , then its group size may be which is not desirable (as this might increase the search time for ). So, for such a vertex, we store only as there is no point in storing the second term (the second term 0 is just a dummy term). But for efficiency, we need to store some information regarding even for the vertex which is not . In the next section, we describe a data-structure which will efficiently store information about all non vertices.
5.2 Heavy Vertices
A vertex is if where . We define a vertex separately. A vertex is said to be if .
Note that our definition partitions the vertex set nicely. We prove this nice property in the following lemma:
If is not , then it is for some where .
Since is not , . Thus, there exists a () such that or (the case when ). ∎
We store the information related to an vertex in a dynamic dictionary where . Since degree of a vertex is , total number of vertices is . Similar to vertices, we divide into groups of size . The only problem with this group size is that it is not defined for . If , then we divide into groups of size 1.
We store the group number of in the dynamic dictionary , that is defined as follows: if . Since we divide into groups of size , the total number of groups is . This implies that total space required to represent the group number per vertex in is bits.
Using Observation 7, if a vertex is , then the associated group size (stored in ) is . The next lemma present a very crucial feature of our algorithm:
Let be a vertex in , then the group size associated with is of size .
If is , then we have already seen that the group size associated with (and stored in ) is . Using Lemma 9, if is not , then it is for . Thus, the information about the group of is stored in , that is . To this end, we divide into group of size . There are two cases:
Since is , . This implies that the size of each group is .
By definition, the group size is exactly .
Thus, the group size associated with is .
The above lemma shows a crucial property of all vertices in . The associated group size of all these vertices is irrespective of their degree. Thus, whenever we are searching for for a vertex , we have to search atmost . We will crucially exploit this property in the restoration algorithm. However, before that let us take a look at the insert and delete procedures.
6 Insert and Delete Procedures
In the Insert procedure, is to be inserted in . But may be full, that is, it has vertices. So, we call which basically aims at removing half of the elements of . After the restoration, has the top vertices of the imaginary stack . We then insert in . If this newly added element becomes the top element of an or the trailer itself is empty then we add to the trailer . Lastly, if is , then it is added to . Three details are missing from the pseudo code of Insert. We list them now:
Let be the total number of vertices in trailer and be the total number of vertices in . We first calculate the total number of vertices below in the imaginary stack . This is = . Once we have calculated , finding is just a mathematical calculation.
This is just a mathematical calculation once we know and .
Finding if is a top element of an
This can be done by maintaining the number of elements currently present in the imaginary stack . Before inserting , if or , then we insert on to the trailer .
The is nearly similar to the Insert procedure. We first check if the number of elements in is less. If yes, then we also have to check if the trailer itself has enough elements. If yes, then we call . After its execution, contains topmost vertices of the imaginary stack . If is , then it is removed from . After this, the top element of (and if necessary) is removed.
The following lemma about the running time of Insert and Delete is immediate (due to our data-structure in Lemma 3).
Apart from the time taken by Restore-Empty and Restore-Full, the running time taken by Insert and Delete procedure is with high probability222Since we use the data-structure described in Lemma 3 at most poly() times, all insert and deletes are successful with probability where is some constant. .