Approximate Query Processing over Static Sets and Sliding Windows

09/14/2018 ∙ by Ran Ben Basat, et al. ∙ iit guwahati Seoul National University 0

Indexing of static and dynamic sets is fundamental to a large set of applications such as information retrieval and caching. Denoting the characteristic vector of the set by B, we consider the problem of encoding sets and multisets to support approximate versions of the operations rank(i) (i.e., computing sum_j <= iB[j]) and select(i) (i.e., finding minp | rank(p) >= i) queries. We study multiple types of approximations (allowing an error in the query or the result) and present lower bounds and succinct data structures for several variants of the problem. We also extend our model to sliding windows, in which we process a stream of elements and compute suffix sums. This is a generalization of the window summation problem that allows the user to specify the window size at query time. Here, we provide an algorithm that supports updates and queries in constant time while requiring just (1+o(1)) factor more space than the fixed-window summation algorithms.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Given a bit-string of size , one of the fundamental and well-known problems proposed by Jacobson [15], is to construct a space-efficient data structure which can answer rank and select queries on efficiently. For , these queries are defined as follows.

  • : returns the number of ’s in .

  • : returns the position of the -th in .

A bit vector supporting a subset of these operations is one of the basic building blocks in the design of various succinct data structures. Supporting these operations in constant time, with close to the optimal amount of space, both theoretically and practically, has received a wide range of attention [16, 18, 19, 20, 23]. Some of these results also explore trade-offs that allow more query time while reducing the space.

We also consider related problems in the streaming model, where a quasi-infinite sequence of integers arrives, and our algorithms need to support the operation of appending a new item to the end of the stream. For , let be the sum of the last integers. Here, is the maximal suffix size we support queries for. For streaming, we consider processing a stream of elements, and answering two types of queries, suffix sum (ss) and inverse suffix sum (iss), defined as:

  • : returns for any .

  • : returns the smallest , , such that .

In this paper, our goal is to obtain space efficient data structures for supporting a few relaxations of these queries efficiently using an amount of space below the theoretical minimum (for the unrelaxed versions), ideally. To this end, we define approximate versions of rank and select queries, and propose data structures for answering approximate rank and select queries on multisets and bit-strings. We consider the following approximate queries with an additive error .

  • : returns any value which satisfies . If , then .

  • : returns any value which satisfies .

  • : returns any position which satisfies .

  • : returns any position which satisfies .

  • : returns any value which satisfies .

  • : returns any value which satisfies .

We propose data structures for supporting approximate rank and select queries on bit-strings efficiently. Our data structures uses less space than that is required to answer the exact queries and most of data structures use optimal space. We also propose a data structure for supporting ssA and issA queries on binary streams while supporting updates efficiently. Finally, we extend some of these results to the case of larger alphabets. For all these results, we assume the standard word-RAM model [17] with word size if it is not explicitly mentioned.

1.1 Previous work

Rank and Select over bit-strings. Given a bit-string of size , it is clear that at least bits are necessary to support rank and select queries on . Jacobson [15] proposed a data structure for answering rank queries on in constant time using bits. Clark and Munro [7] extended it to support both rank and select queries in constant time with bits. For the case when there are ’s in , at least bits111 bits is the information-theoretic lower bound on space for storing a subset of size from the universe . are necessary to support rank and select on . Raman et al. [23] proposed a data structure that supports both operations in constant time while using bits. Golynski et al. [13] gave an asymptotically optimal time-space trade-off for supporting rank and select queries on . A slightly related problem of approximate color counting has been considered in El-Zein et al. [9].

A natural generalization of the static case is answering queries with respect to a sliding window over a data stream. The sliding window model was extensively studied for multiple problems including summing [4, 8], heavy hitters [2, 5], Bloom filters [1] and counting distinct elements [10].

Algorithms that Sum over Sliding Windows. Our ss queries for streaming are a generalization of the problem of summing over sliding windows. That is, window summation is a special case of the suffix sum problem where the algorithm is always asked for the sum of the last elements. Approximating the sum of the last elements over a stream of integers in , was first introduced by Datar et al. [8]. They proposed a multiplicative approximation algorithm that uses bits and operates in amortized time or worst case. In [11], Gibbons and Tirthapura presented a multiplicative approximation algorithm that operates in constant worst case time while using similar space for . [4] studied the potential memory savings one can get by replacing the multiplicative guarantee with a additive approximation. They showed that bits are required and sufficient. Recently, [3] showed the potential memory saving of a bi-criteria approximation, which allows error in both the sum and the time axis, for sliding window summation. [6] looks at a generalization of the ssA queries to general alphabet, where at query time we also receive an element

and return an estimate for the frequency of

in the last elements.

It is worth mentioning that these data structures do allow computing the sum of a window whose size is given at the query time. Alas, the query time will be slower as they do not keep aggregates that allow quick computation. Specifically, we can compute a multiplicative approximation in time using the data structures of [8] and [11]. We can also use the data structure of [4] for an additive approximation of in time.

Query Space (in bits) Query time Error
Lower bounds
, , additive
,
,
Upper bounds
, , additive
,
Table 1: Summary of results of upper and lower bounds for approximate rank and select queries on bit-string of size ( is the number of ’s in ). The function is defined as .
Guarantee Space (in bits) Update Time Query time
DGIM02 [8] -multiplicative
GT02 [11] -multiplicative
BEFK16 [4] -additive, for
BEFK16 [4] -additive, for
This paper -additive Same as in [4]
Table 2: Comparison of data structures for ss queries over stream of integers in . All works can answer fixed-size window queries (where ) in time. Worst case times are specified.

1.2 Our results

In this paper, we obtain the following results for the approximate rank, select, ss and iss queries with additive error. Let be a bit-string of size .

1. rank and select queries with additive error : In this case, we first show that bits are necessary for answering and queries on and propose a -bit data structure that supports and queries on in constant time. For the case when there are ’s in , we show that bits are necessary for answering and queries on , and obtain -bit data structure that supports and queries on in constant time. For and queries on , we show that bits are necessary for answering both queries, and obtain an -bit data structure that supports queries in time, and queries in time. Furthermore, we show that there exists an additive error such that any -bit data structure requires at least time to answer queries on .

Using the above data structures, we also obtain data structures for answering approximate rank and select queries on a given multiset from the universe with additive error , where query returns the value , and query returns the -th smallest element in . We consider two different cases: (i) rankA, drankA selectA, and dselectA queries when , and (ii) drankA and selectA queries when the frequency each elements in is at most . Furthermore for case (ii), we first show that at least bits are necessary for answering drankA queries, and obtain an optimal space structure that supports drankA queries in constant time, and an asymptotically optimal space structure that supports both drankA and selectA queries in constant time when .

We also consider the drankA and selectA queries on strings over large alphabets. Given a string of length over the alphabet of size , we obtain an -bit data structure that supports drankA and selectA on in  time. We summarize our results for bit-strings in Table 1.

2. ss and iss queries with additive error : We first consider a data structure for answering ss and iss queries on binary stream, i.e., all integers in the stream are or . For exact ss and iss queries on the stream, we propose an -bit data structure for answering those queries in constant time while supporting constant time updates whenever a new element arrives from the stream. This data structure is obtained by modifying the data structure of Clark and Munro [7] for answering rank and select queries on bit-strings. Using the above structure, we obtain an -bit structure that supports ssA and issA queries on the stream in constant time while supporting constant time updates. Since at least bits are necessary for answering (or ) queries on bit-strings, and bits are necessary for answering queries [4], the space usage of our data structure is succinct (i.e., optimal upto lower-order terms) when , and asymptotically optimal otherwise.

We then consider the generalization that allows integers in the range , for some . First, we present an algorithm that uses the optimal bits for exact suffix sums. Then, we provide a second algorithm that uses bits for solving ssA. Specifically, our data structure is succinct when , and is asymptotically optimal otherwise, and improves the query time of [4] while using the same space. Table 2 presents this comparison.

2 Queries on bit-strings

In this section, we first consider the data structures for answering approximate rank and select queries on bit-strings and multisets. We also show how to extend our data structures on static bit-strings to the sliding windows on binary streams, for answering approximate ss and iss queries.

2.1 Approximate rank and select queries on bit-strings

We now consider the approximate rank and select queries on bit-strings with additive error . We only show how to support , , , and queries. To support , , , and queries, one can construct the same data structures on the bit-wise complement of the original bit-string. We first introduce a few previous results which will be used in our structures. The following lemmas describe the optimal structures for supporting rank and select queries on bit-strings.

[[7]] For a bit-string of length , there is a data structure of size bits that supports , , , and queries in time.

[[23]] For bit-string of length with 1’s, there is a data structure of size

  • (a) bits that supports query in time, and

  • (b) bits that supports , , , and queries in  time.

We use results from [14] and [22], which describe efficient data structures for supporting the following queries on integer arrays. For a standard word-RAM model with word size bits, let be an array of non-negative integers. For and any non-negative integer , (i) returns the value , and (ii) returns the smallest such that . We use the following function to state the running time of some of the (Searchable Partial Sum) queries in the lemma below, and in the rest of the paper.

[[14], [22]] An array of non-negative integers, each of length at most bits, can be stored using bits, to support sum queries on in constant time, and search queries on in time. Moreover, when , we can answer both queries in time.

Supporting drankA and selectA queries. We first consider the problem of supporting or queries with additive error on a bit-string of length . We first prove a lower bound on space used by any data structure that supports either of these two queries.

Any data structure that supports or queries with additive error on a bit-string of length requires at least bits. Also if the bit-string has 1’s in it, then at least bits are necessary for answering the above queries.

Proof.

Consider a bit-string of length divided into blocks , , … such that for , and (the last block may contain more than bits). Let be the set of all possible bit-strings satisfying the condition that all the bits within a block are the same (i.e., either all zeros or all ones). Then it is easy to see that . We now show that any two distinct bit-strings in will have different answers for some query (and also some query). Consider two distinct bit-strings and in , and let be the index of the leftmost block such that . Then it is easy to show that there is no value which is the answer of both and queries and also there is no position of which is the answer of both and queries, where is the number of 1’s in . Thus any structure that supports either of these queries must distinguish between every element in , and hence bits are necessary to answer or queries.

For the case when the number of ’s in the bit-string is fixed to be , we choose blocks from each bit-string and make all bits in the chosen blocks to be ’s (and the rest of the bits as ’s). Since there are ways for select such blocks in a bit-string of length , it implies that bits are necessary to answer and queries in this case. ∎

Now we describe a data structure for supporting and queries in constant time, using optimal space.

For a bit-string of length , there is a data structure that uses bits and supports and queries with additive error , in constant time. If there are 1’s in , the data structure uses bits and supports the queries in time.

Proof.

We divide the into blocks , , … such that for , and . Now we define a new bit-string of length such that for , if contains -th 1 in for any , and otherwise (note that for any , any block of has at most one position of -th in ). By Lemma 2.1, we can support and queries on in constant time, using bits. Now we claim that gives an answer of the query. Let , and let be the position of -th 1 in . From the definition of , we can easily show that if or , the claim holds since there are less than 1’s in . Now consider the case when and . Then there are at most 1’s in when is the position of the -th 1 in , and all the values in are 1. Also there are at least 1’s in when is the position of the -th 1 in and all the values in are 1. By the similar argument, we can show that one can answer the query in time by returning .

Finally, in the case when there are 1’s in , there are at most 1’s in . Therefore by Lemma 2.1(b), we can support and queries (as before) in time, using bits. ∎

Note that in the above proof, we can answer (or ) queries on using any data structure that supports (or ) queries on . Thus, if is very sparse, i.e., when (in this case, the space usage of the structure of Theorem 2.1 is sub-optimal), one can use the structure of [20] that uses bits (asymptotically optimal space), to support queries in time, and queries in constant time.

Supporting rankA and dselectA queries. Now we consider the problem of supporting and queries with additive error on bit-strings of length . The following theorem describes a lower bound on space.

(*) 222Proofs of the results marked (*) are deferred to the appendix. Any data structures that supports or queries with additive error on a bit-string of length requires at least bits.

We now show that for some values of , any data structure that uses up to a factor more than the optimal space cannot support queries in constant time.

(*) Any -bit data structure that supports queries with an additive error , for some constant on a bit-string of length requires query time.

The following theorem describes a simple data structure for supporting queries.

(*) For a bit-string of length , there is a data structure of size bits, which supports queries on using time and queries on using time.

2.2 Approximate rank and select queries on multisets

In this section, we describe data structures for answering approximate rank and select queries on a multiset with additive error . Given a multiset where each element is from the universe , the rank and select queries on are defined as follows.

  • : returns the value .

  • : returns the -th smallest element in .

One can define approximate rank and select queries on multisets (also denoted as rankA, drankA, selectA, dselectA) analogously to the queries on strings. Any multiset of size can be represented as a characteristic vector of size , such that when the element has multiplicity in the multiset , for . It is easy to show that by answering and queries on , for , one can answer rank and select queries on . We now describe efficient structures for the following two cases.

(1) rankA, drankA, selectA, and dselectA queries when is fixed: We construct a new string of length such that only keeps every -th from , for (and removes all other ’s). To answer the query , we first compute , and return as the answer. It is easy to see that is an answer to the query. Similarly, we can answer the query by returning . We represent using the structure of Lemma 2.1(b), which uses bits and supports , , and queries on in constant time. Thus, both drankA and selectA queries on can be supported in constant time.

For answering rankA and dselectA queries on , we first construct the data structure of Theorem 2.1 to support queries on . In addition, we maintain the data structure of Lemma 2.1 to support sum and search queries on arrays and which are defined as follows. For , and stores the number of ’s and ’s in the block respectively (as defined in the proof of Theorem 2.1). By Lemma 2.1 and Theorem 2.1, the total space for this data structure is bits. To answer , we first find the block of which contains -th by answering query on , and then return query on . To answer , we first find the block of which contains an answer of the query, and then return as the answer for . Note that if , we return for both queries. The total running time is for both rankA and dselectA queries on , by Lemma 2.1 and Theorem 2.1. For special case when , we can answer rankA and dselectA queries on in constant time.

(2) drankA and selectA queries when the frequency of each element in is at most : We first show that at least bits are are necessary for supporting drankA queries on .

(*) Given a multiset where each element is from the universe of size , any data structure that supports drankA queries on requires at least bits, where is a bound on the maximum frequency of each element in .

We describe a data structure which answers drankA and selectA queries on in time. For drankA queries, it uses the optimal space. The details are described in Appendix E.

2.3 Approximate ss and iss queries on binary streams

In this section, we consider a data structure for answering ssA and issA queries on a binary stream. We first show how to modify the data structure of the Lemma 2.1, for answering and queries in constant time using bits, while supporting updates in constant time. We break the stream into frames, which is -bit consecutive elements in the stream. Since one can construct a data structure of Lemma 2.1 in online [7], it is easy to show that we can answer ss and iss queries in constant time using bits while supporting constant-time updates by maintaining two data structure of Lemma 2.1 such as one for the current frame and other for the previous frame of the stream. To make this data structure using bits, we construct a data structure of Lemma 2.1 on the new frame while replacing the oldest part of the data structure constructed on the previous frame. The details of the succinct data structure are described in Appendix G.

Next, we consider a data structure for answering and queries on the binary stream in constant time using bits. We first split each frame into chunks such that for , if and only if contains -th 1 in for any integer . Now consider a (virtual) binary stream of ’s. Then we can construct an -bit data structure for answering , queries in constant time while supporting constant-time updates on the such stream (In the rest of this section, all of ss and iss queries are answered on the virtual stream). We also maintain and , which stores the number of 1’s in the current frame and chunk of the stream respectively. Finally, we maintain an value which is an index of the last-arrived element in the current frame. All these additional values can be stored using bits.

When is arrived, We first increase and by 1 if . If or , we send to the virtual stream if there is an integer such that , and send to the virtual stream otherwise. After that, we update the data structure which supports ss and iss queries on the virtual stream, and reset to zero (if , we also reset to zero). Since we can update the data structure on the virtual stream in constant time, the above procedure can be done in constant time. Now we describe how to answer ssA and issA queries.

  • ssA queries: To answer the query, we return if . If not, let be the -th last element in the virtual stream, Then we return , which gives an answer of the query by the same argument as the proof of Theorem 2.1.

  • issA queries: To answer the query, we return if . Otherwise, we return by the same argument as the proof of Theorem 2.1.

Since ss and iss queries on the virtual stream take time, we can answer both ssA and issA queries on the stream in time. Thus we obtain the following theorem. For a binary stream, there exists a data structure that uses bits and supports ssA and issA queries on the stream with additive error , in constant time. Also, the structure supports updates in constant time.

Comparing to the lower bound of Theorem 2.1 for answering drankA and selectA queries on bit-strings (this also gives a lower bound for answering ssA and issA queries), the above data structure takes bits when . However in the sliding window of size , at least bits are necessary [4] for answering ssA queries even the case when is fixed to . Therefore the data structure of Theorem 2.3 supports ssA and issA queries with optimal space when , and asymptotically optimal otherwise.

3 Queries on strings over large alphabet

In this section, we consider non-binary inputs. First, we look at general alphabet and derive results for approximate rank and select. Then we consider suffix sums over integer streams.

3.1 drankA and selectA queries on strings over general alphabet

Let be a string of length over the alphabet of size . Then, for , the query returns the number of ’s in , and the query returns the position of the -th in (if it exists). Similarly, the queries and are defined analogous to the queries drankA and selectA on bit-strings. One can easily show that at least bits are necessary to support drankA and selectA queries on , by extending the proof of Theorem 2.1 to strings over larger alphabets. In this section, we describe a data structure that supports drankA and selectA queries on in time, using twice the optimal space. We make use of the following result from [12] for supporting rank and select queries on strings over large alphabets. We now use the following lemma to prove our main result for the section.

[[12]] Given a string of length over the alphabet , one can support queries in time and queries in time, using bits, for any .

The following theorem shows we can construct a simple data structure for supporting and queries on using the above lemma.

(*) Let be a string of length over the alphabet . Then for any , one can support and queries in time using bits.

3.2 Supporting ssA queries over non-binary streams

In this section, we consider the problem of computing suffix sums over a stream of integers in . This generalizes the result of the Theorem 2.3 for ssA. For such streams, one can use ssA binary search to solve issA, while a constant time issA queries are left as future work. Specifically, we show a data structure that requires ; i.e., it requires times as many bits as the static-case lower bound of Theorem 2.2 when .

We note that this model was studied in [4, 8, 11] for window-sum queries. That is, our work generalizes this model to allow the user to specify the window size at query time while previous works only considered the sum of the last elements. In fact, all previous data structure implicitly supports ssA queries but with slower run time. [11, 8] requires time to compute a approximation for the sum of the last elements while [4] needs for a -additive one. Here, we show how to compute a -additive error for the sum of the last elements in constant time for both updates and queries.
Exact ss queries En route to ssA, we first discuss how to compute an exact answer for suffix sums queries. It is known, even for fixed window sizes, that one must use bits for tracking the sum of a sliding window [4]. Here, we show how to compute exact ssA using succinct space of bits.

We start by discussing why the current approaches cannot work for a large value. If we use sub-blocks of size as in [7, 15], then the lookup table will require bits, which is not even asymptotically optimal for non-constant values. While one may think that this is solvable by further breaking the sub-blocks into sub-sub-blocks, sub-sub-sub-blocks, etc., it is not the case. To see this, consider a lookup table for sequences of length . Then its space requirement will be bits. If is large (say, ) then this becomes , which is not even asymptotically optimal.

(*) There exists a data structure that requires bits and support constant time (exact) suffix sums queries and updates. General ssA queries Here, we consider the general problem of computing ssA (i.e., up to an additive error of ). Intuitively, we apply the exact solution from the previous section on a compressed stream that we construct on the fly. A simple approach would be to divide the streams into consecutive chunks of size and represent each chunk’s sum as an input to an exact suffix sum algorithm. However, this fails to achieve succinct space. For example, summing integers requires bits. However, bits may be asymptotically larger than the bits lower bound of Theorem 2.2.

We alleviate this problem by rounding the arriving elements. Namely, when adding an input , we first round its value to so it will require bits. The rounding allows us to sum elements in a chunk (using a variable denoted by ), but introduces a rounding error. To compensate for the error, we both consider a smaller chunks; namely, we use chunks of size . We also consider that is slightly lower than to compensate for the rounding error when 333If , then we simply apply the exact algorithm from the previous subsection. We then employ the exact suffix sums construction from the previous section for window size of (the number of chunks that can overlap with the window) over a stream of integers in , where is a bound on the resulting items. We use to denote the input that respresents the current block.

The query procedure is also a bit tricky. Intuitively, we can estimate the sum of the last items by querying for the sum of the last inserted values and multiplying the result by ; but there are a few things to keep in mind. First, may not be an integer. Next, the values within the current chunk (that has not ended yet) are not recorded in . Finally, we are not allowed to overestimate, so ’s propagation may be an issue.

To address the first issue, we weigh the oldest chunk’s value by the fraction of that chunk that is still in the window. For the second, we add the value of to the estimation, where is the sum of rounded elements. Notice that we do not reset the value of but rather propagate it between chunks. Finally, to assure that our algorithm never overestimates we subtract from the result. Our algorithm uses the following variables:

  • - an exact suffix sum algorithm, as described in the previous section. It allows computing suffix sums over the last elements on a stream of integers in .

  • - tracks the sum of elements that is not yet recorded in .

  • - the offset within the chunk.

A pseudo code of our method appears in Algorithm 1.

1:Initialization:
2:function  Add
3:     
4:     
5:     if  then End of a chunk
6:          
7:          
8:                
9:function Query()
10:     if  then Queried within the current chunk
11:          return
12:     else
13:          
14:          
15:          
16:          
17:          return      
Algorithm 1 Algorithm for ssA

Next follows a memory analysis of the algorithm. (*) Algorithm 1 requires  bits.

Thus, we conclude that our algorithm is succinct if the error satisfies . We note that a bits lower bound for Basic-Summing with an additive error was shown in [4], even when only fixed sized windows (where ) are considered. Thus, our algorithm always requires space, even if . Here, is the lower bound for static data shown in Theorem 2.2. Let such that satisfies

then Algorithm 1 is succinct. For other parameters, it uses space. We now state the correctness of our algorithm. (*) Algorithm 1 solves ssA while processing elements and answering queries in constant time.

References

  • [1] Eran Asaf, Ran Ben-Basat, Gil Einziger, and Roy Friedman. Optimal elephant flow detection. In IEEE INFOCOM 2018, pages 1–9, 2018.
  • [2] Ran Ben-Basat, Gil Einziger, and Roy Friedman. Fast flow volume estimation. Pervasive and Mobile Computing, 48:101–117, 2018.
  • [3] Ran Ben-Basat, Gil Einziger, and Roy Friedman. Give me some slack: Efficient network measurements. In MFCS, pages 34:1–34:16, 2018.
  • [4] Ran Ben-Basat, Gil Einziger, Roy Friedman, and Yaron Kassner. Efficient summing over sliding windows. In SWAT, pages 11:1–11:14, 2016.
  • [5] Ran Ben-Basat, Gil Einziger, Roy Friedman, and Yaron Kassner. Heavy hitters in streams and sliding windows. In IEEE INFOCOM, pages 1–9, 2016.
  • [6] Ran Ben-Basat, Roy Friedman, and Rana Shahout. Heavy hitters over interval queries. CoRR, abs/1804.10740, 2018.
  • [7] David R. Clark and J. Ian Munro. Efficient suffix trees on secondary storage. In SODA, pages 383–391, 1996.
  • [8] Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows. SIAM J. Comput., 31(6):1794–1813, 2002.
  • [9] Hicham El-Zein, J. Ian Munro, and Yakov Nekrich. Succinct color searching in one dimension. In ISAAC, pages 30:1–30:11, 2017.
  • [10] Éric Fusy and Frécéric Giroire. Estimating the number of active flows in a data stream over a sliding window. In ANALCO, pages 223–231, 2007.
  • [11] Phillip B. Gibbons and Srikanta Tirthapura. Distributed streams algorithms for sliding windows. In SPAA, pages 63–72, 2002.
  • [12] Alexander Golynski, J. Ian Munro, and S. Srinivasa Rao. Rank/select operations on large alphabets: A tool for text indexing. In SODA, pages 368–373, 2006.
  • [13] Alexander Golynski, Alessio Orlandi, Rajeev Raman, and S. Srinivasa Rao. Optimal indexes for sparse bit vectors. Algorithmica, 69(4):906–924, 2014.
  • [14] Wing-Kai Hon, Kunihiko Sadakane, and Wing-Kin Sung. Succinct data structures for searchable partial sums with optimal worst-case performance. Theor. Comput. Sci., 412(39):5176–5186, 2011.
  • [15] Guy Joseph Jacobson. Succinct Static Data Structures. PhD thesis, Pittsburgh, PA, USA, 1988. AAI8918056.
  • [16] Seungbum Jo, Stelios Joannou, Daisuke Okanohara, Rajeev Raman, and Srinivasa Rao Satti. Compressed bit vectors based on variable-to-fixed encodings. Comput. J., 60(5):761–775, 2017.
  • [17] P. B. Miltersen. Cell probe complexity - a survey. FSTTCS, 1999.
  • [18] J.Ian Munro, Venkatesh Raman, and S.Srinivasa Rao. Space efficient suffix trees. J. Algorithms, 39(2):205–222, 2001.
  • [19] Gonzalo Navarro and Eliana Providel. Fast, small, simple rank/select on bitmaps. In SEA, pages 295–306, 2012.
  • [20] Daisuke Okanohara and Kunihiko Sadakane. Practical entropy-compressed rank/select dictionary. In ALENEX, pages 60–70, 2007.
  • [21] Mihai Pătraşcu and Mikkel Thorup. Time-space trade-offs for predecessor search. In ACM STOC, pages 232–240, 2006.
  • [22] Rajeev Raman, Venkatesh Raman, and S. Srinivasa Rao. Succinct dynamic data structures. In WADS, pages 426–437, 2001.
  • [23] Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algorithms, 3(4):43, 2007.

Appendix A Proof of Theorem 2

Theorem.

Any data structures that supports or queries with additive error on a bit-string of length requires at least bits.

Proof.

We first construct a set of bit-strings of length as follows. We divide each bit-string into blocks , , … such that for , and . Now for every , we set all bits in to if

is odd. If

is even, we fill to ’s followed ’s. Thus there’s only one choice of blocks (if is odd), and choices for blocks (if is even). Hence . Now consider two distinct bit-strings and in , and let be the even index of the leftmost block such that and without loss of generality, and has and s with respectively. Since for such block has zeros on both sides, it is easy to show that there is no value which is the answer of both and queries, and also there is no position in which is the answer of both and