Consider the following static membership problem – given a universe containing elements, we want to store an arbitrary subset of whose size is at most , such that we can answer membership queries of the form “Is in ?” Solutions to problems of this nature are called schemes in the literature. The resources that are considered to evaluate the schemes are the size of the data structure devised to store the subset , and the number of bits read of the data structure to answer the membership queries, called bitprobes. The notations for the space used and the number of bitprobes required are and , respectively. This model of the static membership problem is called the bitprobe model.
Schemes in the bitprobe model are classified asadaptive and non-adaptive. If the location where the current bitprobe is going to be depends on the answers obtained from the previous bitprobes, then such schemes are called adaptive schemes. On the other hand, if the location of the current bitprobe is independent of the answers obtained in the previous bitprobes, then such schemes are called non-adaptive schemes. Radhakrishnan et al.  introduced the notation and to denote the adaptive and non-adaptive schemes, respectively. Sometimes the space requirement of the two classes of schemes will also be denoted as and , respectively.
1.1 The Bitprobe Model
The scheme presented in this paper is an adaptive scheme that uses two bitprobes to answer membership queries. We now discuss in detail the bitprobe model in the context of two adaptive bitprobes.
The data structure in this model consists of three tables – , and – arranged as shown in Figure 1. Any element in the universe has a location in each of these three tables, which are denoted by , and . By a little abuse of notation, we will use the same symbols to denote the bits stored in those locations.
Any bitprobe scheme has two components – the storage scheme, and the query scheme. Given a subset , the storage scheme sets the bits in the three tables such that the membership queries can be answered correctly. The flow of the query scheme is traditionally captured in a tree structure, called the decision tree of the scheme (Figure 1). It works as follows. Given a query “Is in ?”, the first bitprobe is made in table at location . If the bit stored is 0, the second query is made in table , else it is made in table . If the answer received in the second query is 1, then we declare that the element is a member of , otherwise we declare that it is not.
1.2 The Problem Statement
As alluded to earlier, we look into adaptive schemes with two bitprobes (). When the subset size is one (), the problem is well understood – the space required by the data structure is , and we have a scheme that matches this bound [1, 6].
For subsets of size two (), Radhakrishnan et al.  proposed a scheme that takes amount of space, and further conjectured that it is the minimum amount of space required for any scheme. Though progress has been made to prove the conjecture [7, 8], it as yet remains unproven.
For subsets of size three (), Baig and Kesh  have recently proposed a scheme that takes amount of space. It has been subsequently proven by Kesh  that is the lower bound for this problem. So, the space complexity question for stands settled.
In this paper, we look into problem where the subset size is four (), i.e. an adaptive bitprobe scheme that can store subsets of size atmost four, and answers membership queries using two bitprobes. Garg and Radhakrishnan  have proposed a generalised scheme that can store arbitrary subsets of size , and uses amount of space. For the particular case of , the space requirement turns out to be . Garg  further improved the bounds to , which improved the scheme for to .
We propose a scheme for the problem whose space requirement is (Theorem 4.1), thus improving upon the existing schemes in the literature. Our claim is the following:
2 Our Data structure
In this section, we provide a detailed description of our data structure. To achieve a space bound of , more than one element must necessarily share the same location in each of the three tables. We discuss how we arrange the elements of the universe , and which all elements share the same location in any given table.
Along with the arrangement of elements, we will also talk about the size of our data structure. The next few sections prove the following theorem.
The size of our data structure is .
Given the universe containing elements, we partition the universe into sets of size . Borrowing the terminology from Radhakrishnan et al. , we will refer to these sets as blocks. It follows that the total number of blocks in our universe is .
The elements within a block are numbered as . We refer to these numbers as the index of an element within a block. So, an element of can be addressed by the number of the block to which it belongs, and its index within that block.
In table of our data structure, we will have one bit for every block in our universe. As there are blocks, the size of table is .
The blocks in our universe are partitioned into sets of size . Radhakrishnan et al.  used the term superblocks to refer to these sets of blocks, and we will do the same in our discussion. As there are blocks, the number of superblocks thus formed is . These superblocks are numbered as .
For a given superblock, we arrange the blocks that it contains into a square grid, whose sides are of size . The blocks of the superblock are placed on the integral points of the grid. The grid is placed at the origin of a two-dimensional coordinate space with its sides parallel to the coordinate axes. This gives a unique coordinate to each of the integral points of the grid, and thus to the blocks placed on those points. It follows that if is the coordinate of a point on the grid, then .
We can now have a natural way of addressing the blocks of a given superblock – we will use the -coordinate and the -coordinate of the point on which the block lies. So, a given block can be uniquely identified by the number of the superblock to which it belongs, and the and coordinates of the point on which it lies. Henceforth, we will address any block by a three-tuple of the form , where the is its superblock number, and are the coordinates of the point on which it lies.
To address a particular element of the universe, apart from specifying the block to which it belongs, we need to further state its index within that block. So, an element will be addressed by a four-tuple such as , where the first three components specify the block to which it belongs, and the fourth component specifies its index.
Table of our data structure has the space to store one block for every possible point of the grid (described in the previous section). So, for the coordinate of the grid, table has space to store one block; similarly for all other coordinates. As every superblock has one block with coordinate , all of these blocks share the same location in table . So, we can imagine table as a square grid containing points, where each point can store one block.
There are a total of points in the grid, and the size of a block is , so the space required by table is .
2.4 Lines for Superblocks
Given a superblock whose number is , we associate a certain number of lines with this superblock each of whose slopes is . In the grid arrangement of the superblock (Section 2.2), we draw enough of these lines of slope so that every grid point falls on one of these lines. Figure 2 shows the grid and the lines.
So, all lines of a given superblock has the same slope, and lines from different superblocks have different slopes. As there are superblocks, and they are numbered , so, we have the slopes of the lines vary as
There are two issues to consider – the number of lines needed to cover every point of the grid, and the purpose of these lines. We address the issue of the count of the lines in this section, and that of the purpose of the lines in the next.
We introduce the notation to denote the line that has slope , and passes through the point . We now define the collection of all lines of slope that we are going to draw for the superblock .
In the following three lemmas, we show the properties of this set of lines.
Every line of contains at least one point of the grid.
Consider an arbitrary line of . If , then itself is a member of the grid, and is non-empty.
Let us now consider the scenario where . Let , where .
If , we show that is a point that falls on the line through , and it also belongs to the grid. First,
which shows that the point falls on the required line. Also,
which shows that belongs to the grid. Together they show that .
On the other hand, if , the point to consider is . The following equality shows that the point lies on the line through –
To show that the point belongs to the grid, the -coordinate satisfies the following (Equation 1). As for the -coordinate, we have
This shows that even when in non-zero, is non-empty.
Every point of the grid belongs to some line of .
Let be an arbitrary element of the grid. By construction, and are both integers, and . If , then .
If , consider the point . As
falls on the line through . And using arguments similar to the one employed in the previous lemma, one can show that . So, falls on the line .
The equality is a direct consequence of the definition of (Equation 2).
In table , we have space to store one block for every line of every superblock. That means that for a superblock, say , all of its blocks that fall on the line share the same block in table ; and the same is true for all lines of every superblock.
The th superblock contains lines (Lemma 3), so the total number of lines from all of the superblocks is
As mentioned earlier, we reserve space for one block for each of these lines. Combined with the fact that the size of a block is , we have
As described in Section 2.2, any element of the universe can be addressed by a four-tuple, such as , where is the superblock to which it belongs, are the coordinates of its block within that superblock, and is its index within the block.
Table has one bit for each block, so all elements of a block will query the same location. As the block number of the element is , so the bit corresponding to the element is ; or in other words, the element will query the location in table .
In table , there is space for one block for every possible coordinates of the grid. The coordinates of the element is , and has space to store an entire block for this coordinate. So, there is one bit for every element of a block, or, in other words, every index of a block. So, the bit corresponding to the element is .
Table has a block reserved for every line of every superblock. The element belongs to the line , and thus table has space to store one block corresponding to this line. As the index of the element is , so the bit corresponding to the element in table is .
3 Query Scheme
The query scheme is easy enough to describe once the data structure has been finalised; it follows the decision tree as discussed earlier (Figure 1). Suppose we want to answer the following membership query – “Is in ?” We would make the first query in table at location . If the bit stored at that location is 0, we query in table at , otherwise we query table at . If the answer from the second query is 1, then we declare the element to be a member of , else we declare that it is not a member of .
4 The Storage Scheme
The essence of any bitprobe scheme is the storage scheme, i.e. given a subset of the universe , how the bits of the data structure are set such that the query scheme answers membership questions correctly. We start the description of the storage scheme by giving an intuition for its construction.
The basic unit of storage in the tables and of our data structure, in some sense, is a block – table can store one block of any line of any superblock, and table can store one block of a given coordinate from any superblock. We show next that our storage scheme must ensure that a empty and non-empty block cannot be stored together in a table.
Suppose, the block of table is non-empty, and it contains the member of subset . If we decide to store this member in table , then we have to store the block in table . So, we have to set in table the following – . Thus, upon first query will get a 0 and go to table . In table , we store the block at the storage reserved for the line . Particularly, we have to set .
If is a block that is empty, i.e. it does not contain any member of , and it falls on the aforementioned line, i.e. , then we cannot store this block in table , and hence must be set to 1. If this is not the case, and , then the first query for the element will get a 0, go to table and query the location which is same as . We have set this bit to 1, and we would incorrectly deduce that is a member of .
The same discussion holds true for table . If we decide to store the block in table , we have to set to 1. In table , we have space reserved for every possible coordinate for a block, and we would store the block at the coordinate ; particularly, we would set to 1. This implies that all empty blocks from other superblocks having the same coordinate cannot be stored in table , and hence must necessarily be stored in table . To take an example, if is empty, then it must stored it table , and hence .
To summarise, for any configuration of the members of subset , as long as we are able to keep the empty and the non-empty blocks separate, our scheme will work correctly. For the reasons discussed above, we note the following.
We have to keep the non-empty blocks and empty blocks separate.
We have to keep the non-empty blocks separate from each other; and
The empty blocks can be stored together.
Our entire description of the storage scheme would emphasize on how to achieve the aforementioned objective.
Let the four members of subset be
So, the relevant blocks are
and the relevant lines are
In the discussion below, we assume that no two members of belong to the same block. This implies that there are exactly four non-empty blocks. The scenario where a block contains multiple members of is handled in Section 4.3.
The lines for the members of need not be distinct, say when two elements belong to the same superblock and fall on the same line. We divide the description of our storage scheme into several cases based on the number of distinct lines we have due to the members of , and for each of those cases, we provide the proof of correctness alongside it.
4.2.1 Case I
Suppose we have four distinct lines for the four members of . The slopes of some of these lines could be same, or they could all be different. We know that all lines of a given superblock have the same slope, and lines from different superblocks have different slopes (Section 2.4). We also know that if two of these lines, say and , have the same slope, then the corresponding members of belong to the same superblock, i.e. . On the other hand, if their slopes are distinct, then they belong to different superblocks, and consequently, .
Table has space to store one block for every line in every superblock. As the lines for the four members of are distinct, the space reserved for the lines are also distinct. So we can store the four non-empty blocks in table , and all of the empty blocks in table .
To achieve the objective, we set for , and set the bits in table for every other block to 1. In table , we set the bits , for , and all the rest of the bits to 0. In table , all the bits are set to 0.
So, if is an element that belongs to an empty block, it would, according to the assignment above, get a 1 upon its first query in table . Its second query will be in table , and as all the bits of table are set to 0, we would conclude that the element is not a member of .
Suppose, be an element that belongs to one of the non-empty blocks. Then, its coordinates must correspond to one of the four members of . Without loss of generality let us assume that , and .
It follows that , which is same as , is 0, and hence the second query for this element will be in table . The line corresponding to the element is , which is same as , and hence the second query will be at the location . As the four lines for the four members of are distinct, so will be 1 if and only if . So, we will get a Yes answer for your query if and only if the element is actually the element , a member of .
4.2.2 Case II
Let us consider the case when there is just one line for the four members of . As all of their lines are identical, and consequently, the slopes of the lines are the same, all the elements must belong to the same superblock. So, we have .
As all the non-empty blocks belong to the same superblock, all of their coordinates must be distinct. Table can store one block for each distinct coordinate of the grid, and hence we can store the four non-empty blocks there. All the empty blocks will be stored in table .
To this end, we set for , and the rest of the bits of table , which correspond to the empty blocks, to 0. In table , all bits are set to 0. In table , the bits corresponding to the four elements are set to 1, i.e. for . The rest of the bits of table are set to 0.
The proof of correctness follows directly from the assignment, and the reasoning follows along the lines of the previous case. If the element belongs to an empty block, it will get a 0 from table upon its first query, consequently go to table for its second query, and get a 0, implying is not a member of .
If the element belongs to a non-empty block, then its coordinates must correspond to one of the members of . Without loss of generality, let , and .
The first query of the element will be at the location , and hence it will get a 1 from table , and go to table for its second query. In this table, it will query the location , which is same as . As the coordinates of the four members of are distinct, will be 1 if and only if . So, we get a 1 in the second query if and only if we have , a member of .
4.2.3 Case III
The next case that we consider is when there are two distinct lines corresponding to the four members of subset . The members can be distributed in one of two ways – one line contains three elements and the other line one, or the elements might be divided equally among the two lines. We consider the cases separately below.
Consider the case when one line contains three elements, and the other line contains one. Without loss of generality, let the first three members of belong to one line, and the fourth one to another one. So, we have , and the line is different from the others. As lines with same slopes belong to the same superblock, we have . Whether the fourth member belongs to the aforementioned superblock, or to a different superblock depends on whether the slope of is same as the other line or it is distinct.
As the first three elements belong to the same superblock, all will have coordinates distinct from one another. The coordinates of the fourth element could be distinct, or it could overlap with one of the first three.
The case of the coordinates of the four members of being distinct is one we have seen in Case II, where the elements too had distinct coordinates. The assignment for this scenario will be identical to that case, and consequently, the correctness proof follows.
Let us say that the coordinates of the fourth element coincides with one of the other three members. Without loss of generality, let us assume that the third and the fourth elements have identical coordinates, that is to say and . As two blocks of a superblock cannot have the same coordinates, we must have . Moreover, different superblocks have different slopes for its lines, implying .
The assignment in this case will be as follows – we will store the blocks corresponding to the first two elements in table , and the blocks corresponding to the last two elements in table . The empty blocks accordingly will have to be distributed among the two tables.
Accordingly, we set and to 1, and set and to 0. The bits corresponding to the remaining blocks in the two lines, which are and , are set to 1. The bits of the blocks of all the other lines in all of the superblocks are set to 0.
In table , the bits corresponding to the third and the fourth element is set to 1, i.e. , and all the remaining bits are set to 0. In table , only the bits corresponding to the first two elements are set to 1, i.e. ; the rest of the bits of this table are set to 0.
We now prove that the assignment above is correct. If an element belongs to a line other than the lines and , then the bit for its block has been set to 0. Consequently, it will query table . Table has separate space for each line, and only certain bits of the non-empty lines have been set to 1. As falls on a line different from and , so the second query for will also return a 0.
Suppose belongs to an empty block falling on one the lines and . According to our assignment, the bits of the empty blocks from the lines are set to 1, and hence the second query for will go to table . All blocks falling on a line have distinct coordinates, so the coordinates of the block of will be distinct from the coordinates of the non-empty blocks of the two lines. As table has space to store one block for each distinct coordinate, the space for the empty blocks of the two lines will be different from the non-empty ones. As we have set certain bits of the only the non-empty blocks of table to 1, all the bits of the block of must be 0, and hence the answer to second query for will be 0.
It remains to verify whether the queries corresponding to the elements of the four non-empty blocks give correct answers. We have argued above that the empty blocks are stored in locations distinct from the non-empty blocks. The assignment tells us that we have stored the non-empty blocks in its entirety. These two facts together imply that queries for elements in the non-empty blocks will also give correct answers.
We now consider the case when the four members of are divided equally among the two lines. Without loss of generality, let us assume that the first two members belong to one line, and the other two members belong to the other line. So, we have and . Consequently, we have and .
In this scenario, we may have the four non-empty blocks occupying four distinct coordinates of the grid. This situation is familiar to us, and we will handle it as we have done in Case II.
The other scenario is when coordinates of non-empty blocks overlap. As the lines are distinct, they can have an intersection point if and only if they have different slopes. It implies that the lines belong to different superblocks, and hence . Further, as there is only one common point between the two lines, only one pair of non-empty blocks from the two lines can overlap, i.e. have the same coordinates. Without loss of generality, let it be the second and fourth member of . So, we have and .
For all blocks which do not fall on any of the two aforementioned lines, and hence implying that they are empty, we set their bits in table to 0. So, the second query for the elements of these blocks will be in table . As we already know, table has seperate space reserved for all lines, and we set all the bits of all of those empty lines to 0.
An important thing to note so far is we have not stored anything in table yet. We now look into the assignment of the blocks that fall on the two non-empty lines. The blocks that fall on a line have distinct coordinates, so the blocks on the line have distinct spaces in table , and we store all these blocks in table . We accordingly set the corresponding bits in table and .
We now look into the assignment of the blocks on the other line, namely . There is only one block on this line whose coordinate is same as a point on the other line – the block corresponding to the fourth member of has the same coordinate as the second member of . Then, we cannot store the block in table as it is already occupied by the block from the other line. We store this block in table at the space reserved for the line . All other blocks of this line can then be stored in table without any conflict.
The assignment tells us how the empty and the non-empty blocks have been kept separate. An explicit proof of correctness follows along the lines of the previous cases.
4.2.4 Case IV
The final case to consider is when the number of distinct lines due to the non-empty blocks is three. Without loss of generality, let us assume that the blocks corresponding to the third and fourth elements fall on the same line, i.e. . This also means that these two blocks belong to the same superblock, and hence, . It further implies that the coordinates of the two blocks are distinct.
As seen in the previous cases, those lines of the superblocks which do not contain any non-empty block is easy to handle – we simply store them in table at the space reserved for the respective lines. A point to note is that it also leaves table untouched. In the discussion below, we will then concentrate on how we handle the blocks from the three lines which are non-empty.
The discussion will be divided into three parts based on how many of those points coincide. As the blocks corresponding to the third and the fourth members have distinct coordinates, it follows that at most three of the non-empty blocks can coincide.
Let us consider the scenario when three of the non-empty blocks coincide. Without loss of generality, let it be the first three blocks, i.e. and .
We store all the blocks on the line in table . There is only one point in each of the other two lines, namely and , that is common with this line – we store the blocks corresponding to those points in table , and the rest of the blocks of the other lines in table . So, the blocks and are stored in the location reserved in table for the lines and , and the rest of the blocks of these lines are stored in table .
This assignment keeps the empty blocks and the non-empty blocks separate from each other, and the correctness follows.
Let us consider the case where two pairs of non-empty blocks coincide. Without loss of generality, let the first block coincide with the third and the second block coincide with the fourth.
The assignment that we devised for the previous case works in this scenario as well – we store the blocks of the line in table , and the blocks of line and of line in table . The other blocks of the lines and are stored in table .
The correctness proof of the previous case holds in this scenario as well.
Let us next consider the scenario where only one pair of non-empty blocks coincide. Without loss of generality, let the first block coincide with the third. So, we have and . As only one pair of non-empty blocks coincide, the block of the second element do not lie on any of the other non-empty blocks, and hence has coordinates distinct from the rest.
The assignment in this arrangement will depend on the coordinates of the block of the second block – it lies on the line , or it doesn’t. We address each of these cases below.
We store all of the blocks on the line in table . From the line , only one block lies in the previous line, the block containing the first element. This block will be stored in table at the location reserved for the line , and the rest of the blocks can be stored in table . From the last line, i.e. , only one block lies on this line, the block that contains the second element. This blocks will be stored in table at the location for the line , and the rest of the blocks can be stored in table without conflict.
We next consider the case when the block for the second element does not lie on the line . We, in this case, store the second block, i.e. in table , and the rest of the blocks on its line, i.e. , in table at its alloted location. We do the same for the block of the first element – store the non-empty block in table , the rest of the blocks on its line in table .
The only locations used up in table are locations for the first and second block, and the blocks left to be allocated space are those falling on the line . The second block do not lie on this line, and hence would not affect the allocations of the line. The first block coincide with third block falling on this line, so the third block, namely must necessarily be stored in table in the space alloted for the line . The rest of the blocks of the line can now be stored in table without conflict.
This is the final configuration to consider when there are three distinct lines due to the non-empty blocks - no block coincide with any other block. This implies that the four non-empty blocks have distinct coordinates, and hence all of them can be stored in table . All the empty blocks can then be stored in table , and we would have avoided all conflict.
4.3 Blocks with Multiple Members
In the discussion above, we had assumed that each block can contain at most one member of the subset , and we have shown for every configuration of the members of