Analyzing Trade-offs in Reversible Linear and Binary Search Algorithms

10/23/2019 ∙ by Hiroki Masuda, et al. ∙ 0

Reversible algorithms are algorithms in which each step represents a partial injective function; they are useful for performance optimization in reversible systems. In this study, using Janus, a reversible imperative high-level programming language, we have developed reversible linear and binary search algorithms. We have analyzed the non-trivial space-time trade-offs between them, focusing on the memory usage disregarding original inputs and outputs, the size of the output garbage disregarding the original inputs, and the maximum amount of traversal of the input. The programs in this study can easily be adapted to other reversible programming languages. Our analysis reveals that the change of the output data and/or the data structure affects the design of efficient reversible algorithms. For example, the number of input data traversals depends on whether the search has succeeded or failed, while it expectedly never changes in corresponding irreversible linear and binary searches. Our observations indicate the importance of the selection of data structures and what is regarded as the output with the aim of the reversible algorithm design.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Reversible algorithms are useful for exploiting the advantage of forward and backward determinism in reversible systems (Morita, 2017). Their applications include performance optimization in parallel discrete-event simulation (PDES) and quantum circuit synthesis and design, and they can be developed using general reversible simulations (e.g. (Buhrman et al., 2001)). The trade-offs in several simulations have been analyzed in (Buhrman et al., 2001) and the references therein; however, their time/space overheads were not negligible. Optimization with human insight leads to improved reversible algorithms for reversible simulations in terms of time, space, output garbage, etc., and such optimizations combining some of these measures have been constructed previously (e.g. (Axelsen and Yokoyama, 2015)). A memory management method developed exclusively for reversible computing has also been studied (e.g. (Haulund et al., 2017)). Notably, in the original (irreversible) linear search algorithm, the types of output data and data structure do not affect the asymptotic behavior of the number of traversals of the input data or memory usage. In binary search, there is a trade-off between general solutions and efficient, manually-created reversible programs.

In this study, efficient reversible linear and binary search algorithms are constructed and the trade-offs between the number of traversals of the input data and the memory usage are analyzed. As shown in Fig. 1, the reversible search algorithms take a file and a key to find and return useful outputs such as a flag whether the given record is found, and garbage outputs such as the original input and the control flow information.

Figure 1. The I/O of reversible search algorithms to consider.

They assume different types of outputs and data structures, which do not affect the efficiency of corresponding irreversible algorithms but may do so for reversible algorithms and must therefore be taken into consideration during their design. To the best of our knowledge, this is the first attempt to analyze in detail the effect of the selection of the types of outputs and data structures on the efficiency of reversible algorithms. For this reason, the analysis target should be as simple as possible.

To describe reversible programs, Janus, a reversible high-level imperative programming language, was used with procedures and local variable allocations (Yokoyama et al., 2008)

, but the programs presented in this paper can easily be adapted to other reversible languages. For the design of algorithms high-level languages are more intuitive and expressive than the use of popular computation models such as Turing machines and cellular automata or the low level languages. The use of a reversible language for reversible algorithm design itself serves as proof that the constructed algorithm is reversible.

2. Reversible Search

A search problem is to find a record containing a given key in a collection of a file, i.e.  records. For the sake of simplicity, let each record consist of only a key of a totally ordered domain. In this study, we only consider as answers of search problems (a) the numbers of records containing the given keys, (b) the locations of the latest records containing the given keys, or (c) flags indicating whether the given keys exists in the given files.

The useful criteria for the optimality of reversible algorithms are expected to be different from irreversible ones (Axelsen and Yokoyama, 2015; Buhrman et al., 2001; Frank, 1999). We introduce the two criteria for our analysis:

  • The use of memory, except for the original input and output.

  • The size of the garbage, except for the original input.

Here, the data stored in the memory after the computation, apart from the useful output, is defined as garbage. By this definition, the original inputs are garbage, but in practice, it is not uncommon to use them in further computations. Therefore, we exclude their count from both and ; most of the programs discussed in this paper retain the files of the original inputs until the end of the computations. In , the memory for the original input is allowed to be modified during computation if the memory is mutable but should be restored at the end of the computation. It is possible to realize an irreversible linear search with and because, while it requires some memory to store an index or pointer as well as the output, it can freely erase and overwrite memory at any given time.

2.1. Reversible Linear Search

A sequential search from the beginning to the end of the specified files is called linear. The runtime and memory usage of an irreversible linear search is as follows: The linear search returns a number, location, or flag, as mentioned above, traverses the given input only once, and runs in and time with the memory usage , in which the memory usage except for the original input and output is . In the cases of failure or returning the number of records, the search has to traverse all the records and runs in . However, in cases where the records are sorted, it can stop after and time. No garbage is expected .

The reversible linear search was manually optimized using domain-specific knowledge and human intuition. Because the optimal solution changes based on the given resource, cases were divided into three categories in terms of the number of input traversals and :

  1. One traversal and ,

  2. One traversal and , and

  3. One or two traversals and .

(1) is a more efficient special case than (2) and (3), which involve a space–time trade-off. When , the general solution, saving all the lost information, cannot be applied. When there is only one traversal, the call–copy–uncall scheme cannot be used (Yokoyama et al., 2008).

Each file is represented either in (i) an array with a variable containing its size, (ii) a doubly linked list (dlist for short), or (iii) a (singly linked) list. The locations of the head for (ii) and (iii) and the last for (ii) need to be distinguished from the other records. Contrary to the irreversible search algorithm, it is seen the output data and the data structure affect the design of the reversible search algorithm.

Case (1), (i)–(ii), and (a)–(b). Programs were designed with . When an array is used (i), the index is incremented without using extra memory and the index of the last is zero-cleared by the size of the array. When a dlist is used (ii), at the end of the computation, the pointer is zero-cleared by a pointer to the last record . Note that unlike in irreversible setting non-existence of requires non-constant time overhead to delete the information where the last is located.

The programs must traverse the entire input in case (a), even when the records are sorted beforehand. If the traversal is terminated halfway, the location information must be stored somewhere, but the information cannot be erased without performing additional traversal of the input or adding it to the garbage.

Two example reversible programs are demonstrated. The program srch1 (Case (1), (i), and (b)) in Fig. 2 sets i to the location of the key k (the index of the array) closest to the beginning of the file r of length n if any and n otherwise. Here, that n is returned means that the traversal reaches the test on a sentinel r[n]=k and the search is failed. r[n] is a sentinel and its value is k.

1procedure srch1(int r[], int n, int k, int i)
2  from i = 0 loop
3    i += 1
4  until r[i] = k
Figure 2. A reversible linear search returning a location using a sentinel.

The program srch2 (Case (1), (ii), and (b)) in Fig. 3 uses a dlist to store a file and returns the location of a record.

1procedure srch2(int head[], int next[], int prev[], int k, int l)
2  from l = 0 loop
3    local int t = next[l]
4      l ^= prev[t]^t // update l from prev[t] to t
5    delocal int t = l
6  until next[l]=-1 || k=head[l]
Figure 3. A reversible linear search returning a location using a dlist and a sentinel.

A given key and the result location are stored in k and l, respectively. A dlist is represented by three arrays: head[] for keys, next[] for pointers for the next cells, and prev[] for pointers for the previous cells. For the sake of simplicity, the head record is stored in index 0, and the sentinel is stored at the last of the array. indicates a NULL pointer. In the body of srch2, the location l is repeatedly updated until the last is reached (next[l] = -1) or the key k is found (k = head[l]). The local clause reserves a memory cell to store an integer value of next[l], which can be referred by a local variable t, and the delocal clause asserts t has the value of l and frees the memory cell. The pointer to a record is updated as shown in Fig. 4.

Figure 4. Reversible traversal of a dlist.

Another pointer enables the reversible traversal of the records. When we update a pointer to store an address of a cell to an address of the adjacent cell (I), the address of the adjacent cell is obtained by the data pointed to by , the address is stored in a zero-cleared pointer at line 3 (II), the address of the original cell is obtained by the data pointed to by the address , and those two addresses are used to update at line 3 (III), and is zero-cleared by the same address held in at line 3 (IV). If the dlist is mutable, the list traversal using swap can be performed without another pointer .

Case (1), and (iii) or (c). Reversible linear search cannot be conducted. The traversal of lists requires saving information on which cells have been traversed, and this information cannot be erased in a single traversal. A single traversal of the input is not sufficient to erase information on the existence of multiple records having a given key. If such information at the end of computation was a part of garbage, the condition would not be satisfied.

For the case (1), (i)–(ii), and (c), it is possible to construct an algorithm with the further assumption that in an entire given file there is at most a single key that is equal to a given key. It is possible to traverse a given file regardless of the success or failure of the search and set a flag at the time when an answer is found. Therefore, the time is required for any inputs and .

The case (1) has been completely analyzed. In the following, we only consider the cases which have not been discussed.

Case (2), and (i)–(ii). When the search has succeeded and been terminated, the location can be stored and returned as garbage (). When the search has failed, the location can be zero-cleared (). No additional space is necessary except to store the garbage.

Case (2) and (iii). The traversal of immutable lists requires memory space proportional to the number of records traversed. In the case of locations (b) or flags (c) the search can be terminated halfway (). In the case of the numbers (a), the whole list must be traversed (), but if the file is sorted, the search can be terminated halfway (). No additional space is necessary except to store the garbage (). Nonetheless, the list carries significant linear space overhead.

If the list is mutable, it is possible to traverse it by replacing pointers and thus create the reversed list in the process. The original inputs are destroyed, and the remaining list, the reversed list, and location become the garbage output. This technique is useful when the original input is no longer necessary in the subsequent computation.

Case (3), (i)–(ii), and (a). The records are traversed entirely regardless of success or failure. If the given list is sorted, the call–copy–uncall scheme can be performed in time.

Case (3), (i)–(ii), and (b)–(c). Here, the call–copy–uncall scheme is only used in the case of success. When the search succeeds, it requires two traversals of the input and time. When the search fails, the location information is zero-cleared. The failure always runs in . The number of traversals differs based on the success or failure. This is specific to the reversible version of search.

If the given list is sorted and the call–copy–uncall scheme is used in both success and failure cases, it can run in time.

Case (3) and (iii). List traversal requires saving the locations of previous cells (size ) and eventually erasing this information through a second traversal of the input. If the given list is mutable, we can update the pointers of traversed cells and create an intermediate reversed list to rewind the computation ().

If the given list is immutable, extra space is required (), where is the size of each pointer. If the given list is sorted, the order notation can be replaced with , as in the above cases.

For example, the program srch3 in Fig. 5 is a reversible linear search. Formal parameters of the same names as Fig. 3 are used for the same purpose. Variable f is a flag value indicating the success or failure of the search.

1procedure srch3(int head[], int next[], int prev[], int k, int f)
2  local int l = 0
3    call srch2(head,next,prev,k,l)
4    if l = size(head)-1 then  // search failed
5      l ^= size(head)-1       // zero clear l
6    else
7      f ^= 1                  // search succeeded
8      uncall srch2(head,next,prev,k,l)
9    fi f != 1
10  delocal int l = 
Figure 5. A reversible linear search returning a flag using a dlist.

In the body of srch3, srch2 is called and the cases are divided by the success and failure of the search. size(c) returns the length of the array a. The failure of the search is detected when l points to the sentinel, which is stored at the last of the array as mentioned above. When the search has failed, l is zero-cleared by the index of the last. When the search has succeeded, the flag f is set to and the location l is zero-cleared by the inverse computation of srch2. The number of traversals of the input differs based on the failure or the success of the search.

The measures and are analyzed in all the cases for (1)–(3), (i)–(iii), and (a)–(c), and summarized in Table 1. The measures change for each data, output, and resource constraint. These differences do not occur in irreversible corresponding programs. NA indicates that there is no algorithm for the specified case.

(1) One traversal and . (a) number (b) location (c) flag (i) array/(ii) dlist NA (iii) list NA NA NA
(2) One traversal and . (a) number (b) location (c) flag (i) array/(ii) dlist (iii) list
(3) One or two traversals and . (a) number (b) location (c) flag (i) array/(ii) dlist (iii) (mutable) list (iii) (immutable) list
Table 1. Efficiency of reversible linear searches.

2.2. Reversible Binary Search

A binary search targets the records sorted according to the key sizes. It compares the given key and the median record; if they are not equal, the process is repeated within the half of the records that may include the given key. An efficient binary search runs in time.

In each step of an irreversible binary search, the information of the current range is not sufficient to reconstruct the previous range, and the output is not sufficient to identify the final range. A general reversible simulation such as classifying this information as garbage makes the program reversible, but it is inefficient in manipulating the additional space of size

.

Here, the program is made reversible based on the observation that if the range is a power of , the previous range is determined up to unique. This is because the previous range is twice as large as the current one, and at the th iteration when the th bit from the least significant bit is 0, the former half is selected, or else, the latter half is selected. Moreover, the search is terminated only if the range size is 1. Hence, the range at the final state is determined up to unique by the output.

In this section, we consider a reversible binary search that takes a file stored in an array, searches for a key and returns the indices of the matching records if the search succeeds or if it fails.

A reversible binary search program is shown in Fig. 6.

1//set the ceiling of logarithm of n to v
2procedure log2ceil(int n, int v)
3  from v = 0 loop v+=1 until 2**v >= n
4
5procedure bsrch1(int in[], int u, int k, int len)
6  local int l = 0
7    local int i = len
8      u ^= 2**i
9      from l=0 && u=2**len loop
10        i -= 1
11        local int m = 0
12          m ^= l + (u-l)/2
13          if size(in)<=m || in[m]>k then
14            u ^= (m-l)*2 + l       //zero clear u
15            u <=> m                //zero clear m
16          else
17            l ^= 2*m - u           //zero clear l
18            l <=> m                //zero clear m
19          fi (l&(2**i)) = 0
20        delocal int m = 0
21      until i = 0
22    delocal int i = 0
23    u -= 1
24  delocal int l = u
25
26procedure bsrch(int in[], int k, int u)
27  local int len = 0
28    call log2ceil(size(in),len)   //set len
29    call bsrch1(in,n,u,k,len)
30    if u >= size(in) || in[u]!=k then
31      uncall bsrch1(size(in),n,u,k,len)
32                                  //zero clear u
33      u ^= -1
34    fi u = -1
35    uncall log2ceil(size(in),len) //zero clear len
36  delocal int len = 
Figure 6. A reversible binary search.

in[] is an array of records which consist only of keys, k is a key to be found, and u stores an answer location or dummy output indicating the failure of the search. Calling bsrch(in,k,r) sets to u the index of key k in in[] if any, and otherwise. Only u changes before and after the call. Therefore, .

We compute , i.e. a smallest integer that is larger than or equal to by means of the call–copy–uncall scheme where the call log2ceil sets to zero-cleared len at line 6 and the uncall is performed to zero-clear len at line 6.

The reversible procedure bsrch1 iteratively narrows the searched range in the reversible loop. The range is represented by the indices l to u-1. Before the loop, l is set to , u is set to . Thus, the range size is expanded from n to the smallest power of that is greater than or equal to n. At line 6 the middle of the range is set to m. Next, to halve the range, the reversible conditional decreases the upper index u if the middle index is out of the range of in[] or the middle m is greater than the given key k; otherwise, it increases the lower index l.111Tests and assertions are short-circuit expressions. Therefore, even if the m th of in[] does not exist, n <= m passes the control to the then branch. Only the else branch in the th iteration changes the th bit from the least significant bit from to . Therefore, the control is merged at line 6. After the loop, the range size is just one (l equals to u-1). u is set to the index at line 6 and l is deallocated at line 6.

At line 6, the procedure bsrch checks whether in[u] is a record to search. If not, u must not be a part of the output and should be erased. In such a case, the inverse call of the reversible procedure bsrch1 zero-clears u, and u is set to , indicating that the search has failed. Because the index must be non-negative, the assertion u = -1 merges the control flow.

When the search fails, bsrch1 is called twice (once for call and once for uncall), and the input array is traversed twice. However, a single traversal is sufficient in the case of a successful search. This serves as an advantage over the general solution using the call–copy–uncall scheme.

bsrch runs in time and . It should be noted that the initial expansion of the search range to a maximum of twice the size does not require the allocation of additional memory cells, and the asymptotic time/space complexity are not degraded.

3. Conclusion

In this study, purely reversible implementations of reversible linear and binary searches are presented, and it is shown that the types of output data and data structures used, which do not affect the efficiency of irreversible algorithms, can affect the efficiency of corresponding reversible algorithms. It is demonstrated here that there are trade-offs in terms of the amount of traversal of the input data, the memory usage , and garbage size in reversible linear and binary searches of multiple types of outputs and data structures. Because linear and binary searches are fundamental, a number of the programs in trade-off relationship are useful in reversible software development under various resource conditions.

The design of an efficient reversible algorithm is not trivial; for example, reversible linear (resp. binary) search requires two (resp. one) traversals in the case of success and one (resp. two) traversal in the case of failure. This difference is not directly implied by the existing design of conventional linear and binary searches. Clearly, the design of reversible algorithms requires different principles from the design of conventional (irreversible) algorithms.

Our work has targeted linear and binary searches. In conventional algorithm design, more general strategies such as dynamic programming and greedy algorithms have been proposed. Developing such general strategies in reversible setting is one of our future work. The criteria used in this paper and are useful for analyzing reversible linear and binary search algorithms. However, to what reversible algorithms it is useful to analyze is not clear and more meaningful criteria for more general purpose is desirable. In the future, we aim to further develop the criteria for measuring the efficiency and design techniques of reversible programs.

Acknowledgements.
This work was supported by JSPS KAKENHI Grant Number 18K11250 and Pache Research Subsidy I-A-2 for the 2019 academic year.

References

  • (1)
  • Axelsen and Yokoyama (2015) Axelsen, H.B. and Yokoyama, T. 2015. Programming Techniques for Reversible Comparison Sorts. In APLAS (LNCS), Feng, X. and Park, S. (Eds.), Vol. 9458. Springer-Verlag, 407–426.
  • Buhrman et al. (2001) Buhrman, H., Tromp, J., and Vitányi, P. 2001. Time and Space Bounds for Reversible Simulation. In ICALP (LNCS), Orejas, F., Spirakis, P.G., and van Leeuwen, J. (Eds.), Vol. 2076. Springer-Verlag, 1017–1027.
  • Frank (1999) Frank, M.P. 1999. Reversibility for Efficient Computing. Ph.D. Dissertation. MIT.
  • Haulund et al. (2017) Haulund, T., Mogensen, T.Æ., and Glück, R. 2017. Implementing Reversible Object-Oriented Language Features on Reversible Machines. In RC, Phillips, I. and Rahaman, H. (Eds.). Springer-Verlag, 66–73.
  • Morita (2017) Morita, K. 2017. Theory of Reversible Computing. Springer-Verlag.
  • Yokoyama et al. (2008) Yokoyama, T, Axelsen, H.B., and Glück, R. 2008. Principles of a Reversible Programming Language. In Computing Frontiers. Proceedings. ACM Press, 43–54.